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Preface 


These  proceedings  contain  the  papers  presented  at  the  1992 
Symposium  on  Interactive  3D  Graphics  held  at  the  Royal 
Sonesta  Hotel  in  Cambridge,  Massachusetts  on  March  29  • 
April  1, 1992. 

The  symposium  focuses  on  innovative  3D  graphics  architec¬ 
tures  and  hardware,  algorithms  for  generating  visual,  haptic 
and  auditory  output,  perceptual  and  psychological  issues  of 
viewing  and  operating  in  complex  virtual  spaces,  interactive 
simulations  distributed  over  local  and  long-haul  networics, 
real-time  dynamics,  and  innovative  human-machine  interface 
technologies  and  paradigms. 

The  call  for  participation  was  written  in  April,  1991,  distrib¬ 
uted  at  Siggr^h  '9 1 ,  and  disseminated  throughout  the  grt^hics 
community.  Th.e  deadline  for  submission  of  extended  ab¬ 
stracts  wasSeptember  18,  1991  at5:00prn.  Inkceping  with  the 
rule  applied  for  Siggraph  conferences,  this  dcadlhie  was 
strictlyenforced.  OnSeptember  19,  the69  submitted  abstracts 
were  scanned  by  program  coK:hairs  Ed  CatmuU  and  Marc 
Levoy  and  distributed  to  a  committee  consisting  of  24  promi¬ 
nent  researchers  from  the  graphics,  human-computer  interac¬ 
tion,  and  psychology  research  communities.  Each  abstract 
received  at  least  four  reviews,  and  many  received  five.  On 
October  28,  the  program  committee  met  at  Stanford  Univer¬ 
sity  and  selected  30  papers  to  be  published  in  the  proceedings 
and  presented  at  the  symposium,  Submissions  were  accepted 
either  as  short  or  long  papers  (4  pages  or  12  pages  respectively) 
and  were  designated  as  short  or  long  symposium  presentations 
(IS  minutes  or  25  minutes  respectively). 

To  insure  a  lively  symposium  and  close  interaction  among  the 
participants,  attendance  was  limited  to  under 200 participants, 
and  the  program  was  spiced  with  frequent  panels,  live  demon¬ 
strations,  and  social  events.  We  were  also  privileged  to  have 
asourkeynote  speaker  Andriesvan  Dam,  IWlrecipientofthe 
Steven  A.  Coons  Award  for  Outstanding  Creative  Contribu¬ 
tions  to  Computer  Graphics,  and  as  our  capstone  speaker 
Stuart  Card  of  Xerox  PARC. 

There  are  many  people  without  whose  volunteer  efforts  this 
symposium  could  not  have  succeeded.  The  chairs  would  first 
of  all  like  to  thank  the  members  of  the  program  committee  for 
their  reviews,  their  hard  day’s  work  at  Stanford,  and  their 
numerous  suggestions  on  the  format  of  this  and  future  sympo¬ 
sia: 

Kurt  Akeley,  Silicon  Graphics 
Nonn  Badlcr,  U.  of  Pennsylvania 
Eric  Bier,  Xerox  Pi\RC 
Elain  Cohen,  U.  of  Utah 
Tom  DeFanti,  U.  of  Illinois  -  Chicago 
Tony  DeRosc,  U.  of  Wasliington 


Tom  Ferrin,  U.  of  California  at  San  Francisco 
Alain  Fournier,  U,  of  British  Columbia 
Henry  Fuchs,  U.  of  N.  Carolina  at  Chapel  Hill 
Paul  Haeberli,  Silicon  Graphics 
Pat  Hanrahan,  Princeton  University 
Paul  Heckbert,  U.  of  California  at  Bericeiey 
Leo  Hourvitz,  NeXT  Computer 
S.  Kicha  Ganapathy,  AT&T  Bell  Labs 
Margaret  Min^y,  MIT 
Eben  Ostby,  Pixar 
Alex  Pentland,  MIT 
Rich  Reiscnfeld,  U.  of  Utah 
Carlo  Sequin,  U.  of  California  at  Bericeiey 
Spencer  Thomas,  University  of  Michigan 
Brian  Wandell,  Stanford  University 
Lance  Williams,  Apple  Computer 
Andrew  Witkin,  Carnegie  Mellon  University 
Mike  Zyda,  Naval  Postgraduate  School 

Special  thanks  are  due  to  Dec  Bell  of  Pixar,  whose  organiza¬ 
tional  skills  kept  the  work  flowing  smoothly  throughout  her 
advancing  pregnancy,  Kay  Seirup  of  Pixar,  who  picked  up  the 
torch  and  formatted  these  procee^ngs  when  Dee’s  pregnancy 
became  maternity,  and  Rhea  Zdimal  of  Stanford,  who  orga¬ 
nized  the  program  committee  meeting  with  skill  and  style.  In 
Boston,  Janette  Noss  of  the  MIT  Media  Lab  provided  admin¬ 
istrative  and  organizational  support,  and  handled  an  infinity  of 
details  for  the  symposium  itself,  Greg  Tucker,  also  of  the 
Media  Lab,  valiantly  provided  support  for  the  audio/visual  and 
demonstration  equipment. 

We  thank  Judy  Brown  and  Steve  Cunningham  for  their  help 
in  obtaining  ACM  SIGGRAPH  "in  cooperation"  status  and 
publication  of  these  proceedings.  Thanks  to  Nicholas 
Negroponte  and  the  MIT  Media  Lab  for  providing  generous 
support  forcolor  reproduction  in  the  proceedings.  In  addition, 
we  dso  wish  to  acknowledge  the  generous  contributions  of  the 
following  organizations: 

Office  of  Naval  Research 
National  Science  Foundation 
USA  Ballistic  Research  Laboratory 
Hewlett-Packard 
Silicon  Graphics 
Sun  Microsystems 

It  has  been  a  privilege  to  woric  with  such  an  enthusiastic  and 
dedicated  crowd  of  people.  Although  it  is  only  December  as 
these  proceedings  go  to  press,  inquiries  concerning  regisua- 
tion  have  been  running  at  feverpitch.  As  with  the  previous  two 
symposia,  the  suict  attendance  limit  has  generated  conux>- 
versy  and  occasionally  disappointment,  but  the  program  com¬ 
mittee  feels  that  the  small  size  and  narrow  focus  of  the 
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symposium  are  keys  to  its  continuing  success.  We  anticipate 
a  provocative  and  inspiring  symposium  in  March,  and  we  look 
forward  to  ntany  repetitions  in  the  coming  years. 


David  Zeltzer,  Symposium  Chair 

Ed  Catmull  and  Marc  Levoy,  Program  Co-Chairs 

December  1991 
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Escaping  Flatland  in  User  Interface  Design 
Andries  van  Dam,  Brown  University 


Fast  and  inexpensive  computers  and  many  productivity- 
enhancingapplications  have  made computerusersofasignifeant 
percentage  of  our  population,  professional  and  casual  users 
alike.  And  the  advances  in  ease  of  learning  and  ease  of  use 
made  possible  by  modem  user  interfaces  have  helped  immea¬ 
surably  in  this  process.  These  superior  interfaces,  made 
possible  by  hardware  such  as  bitmap  graphics  and  the  mouse, 
depend  on  the  contributions  of  user  interface  designers  who 
have  created  a  new  design  discipline  with  its  own  tools  and 
methodologies. 

Hardware  advances  continue  unabated,  and  decreases  by 
a  factor  of  two  in  price/pcrformance  occur  almost  yearly. 
Multimedia  is  today’s  buzzword  and  the  hardware  support  for 
it,  as  usual,  outpaces  the  software  to  exploit  it.  Low-level 
hardware  support  for  3D  realtime  shaded  graphics  is  already 
built  into  a  commodity  CPU  chip  (the  Intel  i860)  and  will  soon 
not  just  be  part  of  workstations  specialized  for  the  nascent  3D 
market  but  be  integrated  into  entry-level  workstations  and, 
shortly  thereafter,  personal  computers.  Indeed,  distinctions 
between  workstations  and  personal  computers  will  all  but 
disappear  as  they  share  more  and  more  hardware  and  software 
features.  As  forecast  by  Raj  Reddy  and  others,  affordable  3G 
machines  (gigalPS/FLOPS,  gigabyte  of  main  memory, 
gigabaud  communication)  will  appear  on  our  desks  before  the 
end  of  the  decade.  They  will  also  smooth-shade  the  equivalent 
of  1  megapolygon/sec.  Thus  multimedia  support  and  3D 
graphics  wilt  finally  become  mainsueam  .integrated  into  every 
desktop  computer. 

What  new  opportunities  do  tliese  exciting  technology 
developments  mtAe  possible? 

Can  we  expect  paradigm  shifts  in  computing  akin  to 
those  arising  from  Xerox  PARC’s  pioneering  work  on  biunap 
worksmtions  in  the  early  seventies? 

Those  attending  this  symposium  undersumd  die  impor- 
umce  and  potential  of  3D.  They  will  therefore  not  be  surprised 
by  my  claim  that  one  of  die  next  major  frontiers  in  computing 
is  the  introduction  of  realtime  3D  graphics  into  exisdng 
everyday  applications  and  die  creation  of  new  3D  applications. 
The  eighUes  were  die  decade  in  which  computers  and  2D 
graphics  finally  became  fast  enough  to  run  a  host  of  2D 
interactive  applicadons.  These  include  drawing/painting  pro¬ 
grams,  WYSIWYG  word  processors  and  desktop  publishing 
programs.  The  nineties  will  see  a  rapidly  growing  set  of 
interactive  3D  applications,  bodi  the  traditional  applications 
for  specialists  (e.g.,  3D  CAD/CAM,  scientific  visualization) 
and  those  for  both  professional  and  casual  users  (e.g.,  3D 
illusuation  and  animation  programs,  interior  design  and 
walkthrough  programs).  There  need  not  even  be  a  ‘killer 
application’  for  3D,  akin  to  2D’s  spreadsheets  or 
wordprocessing,  to  justify  its  importance  as  a  new  dimension 
in  computer  applicadons;  1  believe  3D  will  be  found  useful  in 
many  applicadons  today  considered  2D. 


Spreadsheets  are  2-1/2D  already,  and  Xerox  PARC  has 
used  3D  widgets  that  exploit  real-dme  animadon  to  visualize 
data  that  is  not  intrinsically  spadal,  let  alone  3D.  CASE  tools 
that  provide  progr  am  and  dgorithm  visualizadon  will  reap  the 
same  benefits  from  rcaldmc  3D  graphics  that  science  and 
engineering  obtain  from  sciendfic  visualizadon  technology 
today.  ElccU'onic  books,  to  be  used,  for  example,  for  technics 
documentadon,  educadon  and  entertainment,  will  contain 
‘intcraedve  illusuations’,  i.e.,  user-controlled,  model-driven, 
rcal-dmc  animadon,  in  addidon  to  video.  Many  of  these 
models  will  be  3D  virtual  worlds:  2D  illustradons  can  then 
become  an  important  special  case  of  the  more  general  3D 
illustradons. 

While  3D  has  been  prevalent  for  many  years  in  such  fields 
as  mechanical  CAD/CAM  and  scicndfic  visualizadon,  even  in 
such  applications  the  user  interface  has  been  largely  2D; 
menus,  dialogue  boxes,  sliders,  etc.  There  arc  surprisingly  few 
3D  widgets  beyond  3D  cursors,  virtual  sphere  simuladons  of 
3D  joysdeks,  and  gestural  sclecdon,  transladon  and  rotadon. 
Why  this  paucity?  Among  the  reasons  arc  that  undl  very 
rcccndy  3D  has  been  unavailable  to  interface  and  applicadon 
designers  except  on  specialized,  expensive  platforms.  An¬ 
other  reason  is  that  3D  (and  rcaldme  animadon)  inuoducc  not 
only  new  modalidcs  of  use  but  also  new  complcxidcs.  Further¬ 
more,  user  interface  designers  have  not  had  3D  toolkits  for 
construedng  3D  widgets.  Finally ,  little  research  has  been  done 
thus  far  on  creadng  new  3D  metaphors  and  inicracdon  para¬ 
digms.  Even  virtual  reality  research  has  had  to  focus  on  using 
and  improving  the  sdll  primidve  hardware  technology.  Yet  it 
is  necessary  not  just  for  input  and  output  hardware  to  condnue 
to  evolve  dramadcally:  it  is  equally  important  that  we  stretch 
our  imagi.na'ion  to  think  of  new  ways  of  intcraedng  with  our 
objects  and  data  items  and  their  interrelationships. 

Among  the  issues  that  arise  in  designing  3D  interfaces  are 
the  tradeoffs  between  direct  manipuladon  and  indirect  ma¬ 
nipulation  dirough  a  widget.  Direct  manipuladon  involves 
widgets  diat  have  behavior  but  little  or  no  gcomeuy,  such  as 
gestural  control  for  selection,  translation  and  rotadon.  Indirect 
manipulation  is  done  using  2D  and  3D  widgets  that  have  both 
gcomeuy  and  behavior  such  as  object  handles  in  a  drawing 
program.  Such  widgets  abstract  out  the  salient  parameters  of 
die  objects  to  be  manipulated  and/or  of  the  operadons  them¬ 
selves,  Anodicr  issue  is  the  separadon  between  interface  and 
application  objects.  Current  user  interface  design  favors 
separating  w  idgets  from  die  objects  they  conuul.  Such  widgets 
are  cons^ucted  widi  dieir  own  design  tools.  In  our  paper  in 
diese  praecdings  we  advocate  making  widgets  first-class 
objats  in  die  same  environment  diat  contains  the  applicadon 
objects,  and  constructed  w  ith  die  same  tools.  Examples  of  our 
3D  widgets  will  be  shown,  which  we  hope  will  suinulute  the 
3D  research  community  to  consider  realdine  3D  not  only  as  a 
technology  or  application  domain  hat  also  as  a  means  for 
creadng  engaging,  produedve  use'  interfaces. 
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Management  of  Large  Amounts  of  Data  in 
Interactive  Building  Walkthroughs 

Thomas  A.  Funkhouser,  Carlo  H.  Sequin  and  Seth  J.  Teller 
University  of  California  at  Berkeley^ 


'"Abstract 


Wc  describe  techniques  for  managing  large  amounts  of  data 
during  an  interactive  walkthrough  of  an  architectural  model. 
These  techniques  arc  based  on  a  spatial  subdivision,  visibility 
analysis,  and  a  display  database  containing  objects  described 
at  multiple  levels  of  detail.  In  each  frame  of  the  walkthrough, 
wc  compute  a  set  of  objects  to  render,  i.e.  those  potentially 
visible  from  the  observer’s  viewpoint,  and  a  set  of  objects  to 
swap  into  memory,  i.e.  those  that  might  become  visible  in 
the  near  future.  We  choose  an  appropriate  level  of  detail  at 
which  to  store  and  to  render  each  object,  possibly  using  very 
simple  representations  for  objects  that  appear  small  to  the 
observer,  thereb)  saving  space  and  time.  Using  these  tech¬ 
niques,  wc  cull  away  large  portions  of  the  model  that  arc  ir¬ 
relevant  from  the  observer’s  viewpoint,  and  thereby  achieve 
interactive  frame  rates. 


CR  Categories  and  Subject  Descriptors: 

[Information  Systems]:  H.2.8  Database  Applications. 
(Computer  Graphics):  I.3.S  Computational  Geometry  and 
Object  Modeling  -  geometric  algorithms,  languages,  and 
systems’,  1.3.7  Three-Dimensional  Graphics  and  Realism  - 
visible  linetsurface  algorithms. 

Additional  Key  Words  and  Phrases:  architectural  sim¬ 
ulation,  virtual  reality. 
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1  Introduction 

Interactive  computer  programs  that  simulate  the  experience 
of  “walking”  through  a  building  interior  are  useful  for  vi- 
suali74ition  and  evaluation  of  building  models  before  they 
are  constructed.  However,  realistic-looking  building  mod¬ 
els  with  furniture  may  consist  of  tens  of  millions  oi  polygons 
and  require  gigabytes  of  data  -  far  more  than  today’s  worksta¬ 
tions  can  render  at  interactive  frame  rates  or  fit  into  memory 
simultaneously.  In  order  to  achieve  intcrartive  walkthroughs 
of  such  large  building  models,  a  system  must  store  in  mem¬ 
ory  and  render  only  a  small  portion  of  the  model  in  each 
frame;  that  is,  the  portion  seen  by  the  ooserv'er.  As  the  ob¬ 
server  “walks”  through  the  model,  some  parts  of  the  model 
become  visible  and  others  become  invisible;  some  objects 
appear  larger  and  others  appear  smaller.  The  challenge  is  to 
i^ntify  the  relevant  portions  of  the  model,  swap  them  into 
memory  and  render  them  at  interactive  frame  rates  (at  least 
ten  frames  per  second'  ar  the  observer’s  viewpoint  is  moved 
under  user  control. 

Using  the  design  of  Soda  Hall,  a  planned  computer  sci¬ 
ence  building  at  UC  Berkeley,  as  a  test  object,  we  have  com¬ 
pleted  the  first  version  of  a  system  that  supports  interactive 
walkthroughs  of  large,  fully  furnished  building  models.  Our 
system  builds  upon  pioneering  work  by  Airey  and  Brooks 
[  1 ,2,S]  and  uses  conceptual  ideas  going  back  to  Jones  [8]  and 
Clark  [6|.  The  special  features  of  our  system  are  1)  a  hier¬ 
archical  display  database  that  describes  the  building  model 
as  a  set  of  objects  represented  at  multiple  levels  of  detail; 
2)  a  spatial  subdivision  and  visibility  analysis  in  which  the 
building  model  is  divided  into  cells,  and  cell-to-cell  and  cell- 
to-object  visibility  infonnation  is  computed;  3)  a  real-time 
memory  management  algorithm  for  swapping  objects  i.i  and 
out  of  memory  as  the  observer  moves  through  the  model;  and 
4)  a  real-time  refresh  algorithm  for  choosing  which  objects 
to  render  at  which  levels  of  detail  in  each  frame. 

1.1  System  Overview 

Our  system  is  divided  into  three  distinct  phases  as  shown  in 
Figure  1.  First,  during  the  modeling  phase,  we  consPuct  die 
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building  model  from  AutoCAD  floor  plans  and  elevations, 
and  populate  the  model  with  furniture.  Next,  during  the  pre¬ 
computation  phase,  we  perform  a  spatial  subdivision  and 
observer-independent  lighting  mid  visibility  calculations.  Fi¬ 
nally,  during  the  walkthroughphase,  we  simulate  an  observer 
moving  through  the  building  model  under  user  control  with 
the  mouse,  rendering  the  model  as  seen  from  the  observer’s 
viewpoint  in  each  frame.  The  display  database  is  the  link 
between  these  three  phases.  It  stores  the  complete  building 
model,  along  with  the  results  of  the  precomputation  phase, 
for  use  during  the  walkthrough  phase. 


Moddlng  Phatt 


2  Modeling  Phase 

Our  walkthrough  system  requires  a  detailed  3D  model  of  a 
building,  ton.plete  with  furniture  and  realistic  materia!  and 
lighting  information. 

We  first  convert  tlie  raw  25D  model  received  from  the  ar¬ 
chitects  in  AutoCAD  DXF  fonnat  [3]  into  a  consistent  3D 
representation  in  Berkeley  UNIGRAFIX  format  [10).  Un¬ 
fortunately,  the  raw  architectural  models  that  we  received 
were  not  true  tliree-dimensional  models  and  contained  non- 
plantir  faces,  coincident  coplanar  faces,  improper  face  inter¬ 
sections,  and  inconsistent  face  orientations.  During  conver¬ 
sion,  our  programs  [9]  detect  and  automatically  correct  many 
of  these  anontalics.  Any  remaining  modeling  errors  are  cor¬ 
rected  manually  using  interactive  tools. 

Wc  then  populate  the  ai  chiteciural  model  with  stairs,  furni¬ 
ture  anil  other  objects  tliat  a  user  would  expect  to  find  in  a  typ¬ 
ical  building.  Wc  have  generated  highly  detailed  descriptions 
for  several  pieces  of  furniture  using  interactive  modeling  pro- 
gra'T’«  and  received  others  from  Greg  Ward  of  Lawrence 
Bcrkt!  >  Laboratories.  We  place  insi  'aces  of  tliese  objects 
into  die  building  niodel  using  both  automatic  and  intw.iC- 
ti\e  placement  poigraiiis.  Weha\e  written  several  programs 
ihai  auioiiialically  place  objects  into  specifii,  v  pes  of  rooms 


based  on  sets  of  parameters.  For  instance,  the  “conference 
room  generator”  places  a  rectangular  or  elliptical  table  in  the 
middle  of  a  room,  chairs  all  around  it,  a  blackboard  on  one 
wall,  a  transparency  projector  on  the  table,  and  so  on.  The 
“office  generator”  places  a  desk  against  one  wall,  a  chair  in 
front  of  the  desk,  some  bookshelves  against  the  walls,  and  so 
on.  Numerous  parameters  arc  available  for  the  user  to  control 
the  size,  number  and  placement  of  objects  with  each  of  these 
programs.  We  have  aiso  written  a  program  for  interactively 
placing  objects  into  a  three-dimensional  model.  It  allows  a 
user  to  add,  delete,  or  move  object  instances  with  real-time 
visual  feedback. 

Gradually,  wc  load  the  walls  and  furniture  of  the  build¬ 
ing  model  into  the  walkthrough  display  database.  The  dis¬ 
play  database  represents  the  building  model  as  a  set  of  ob¬ 
jects  (c.g.  walls,  desks,  chairs,  telephones,  pencils,  etc.), 
each  of  which  can  be  described  at  multiple  levels  of  detail 
16].  Wc  construct  less  detailed  reprc.sentations  of  objects 
from  the  highly  detailed  originals  using  an  interactive  de¬ 
sign  tool  that  allows  a  user  to  simplify  3D  objects  by  deleting 
and  merging  vertices  and  faces.  For  instance,  we  construct 
five  representations  of  a  dask:  1)  a  highly  detailed  desk  with 
faces  subdivided  along  gradients  of  radiosity,  2)  a  slightly 
less-detailed  desk  with  simple  handles  and  larger  faces,  3) 
an  even  less-detailed  desk  without  any  handles  at  all,  4)  a 
coarsely  detailed  desk  with  only  legs  and  drawers,  and  S)  a 
simple  box.  Tlicsc  object  abstraction  hierarchies  are  adjusted 
interactively  so  that  transitions  between  levels  ore  barely  no¬ 
ticeable  as  one  zooms  closer  to  an  object  and  detail  is  refined. 
Levels  of  detail  arc  chosen  dynamically  during  the  interac¬ 
tive  walkthrough  pha.se  to  improve  refresh  rates  and  memory 
utilization. 

So  far,  wc  have  built  a  completely  furnished  model  of  the 
sixth  floor  of  Soda  Hall,  the  planned  computer  science  build¬ 
ing  at  U.C.  Berkeley.  This  floor  model  has  a  total  of  2,320 
objects,  represented  at  up  to  five  levels  of  detail,  and  contains 
over  400,000  faces,  requiring  68MB  of  storage.  Color  Plate 
I  shows  a  top-view  of  the  model. 


3  The  Precomputation  Phase 

After  tlic  complete  building  model  has  been  loaded  into  die 
display  database,  wc  distribute  the  model  into  a  spatial  sub¬ 
division  and  perfonn  a  visibility  analysis  of  the  model  cells 
and  objects.  Die  resulting  information  is  stored  in  the  display 
database  for  use  by  the  display  and  memory  management  al¬ 
gorithms  during  the  walkjirough  phase. 

3.1  Spatial  Subdivision 

We  ."ucdividc  die  model  using  a  variant  of  the  k-D  tree 
data  suucturc  [4],  Splitting  planes  are  inuoduced  along 
the  major  opaque  elements  in  the  model,  namely  the  walls, 
door  frames,  floors,  and  ceilings  (details  are  given  in  [11]) 
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The  subdivision  terminates  when  all  sufficiently  large,  ax¬ 
ial  opaque  elements  in  the  model  are  coplanar  with  an  axial 
boundary  plane  of  at  least  one  subdivision  leaf  cell. 

After  subdivision,  cell  portals  (i.e.,  the  transparent  por¬ 
tions  of  shared  boundaries)  arc  identified  and  stored  with 
each  leaf  cell,  along  with  an  identifier  for  the  neighboring  cell 
to  which  the  portal  leads  (Figure  2).  Enumerating  the  portals 
in  this  way  amounts  to  constructing  an  adjacency  graph  over 
the  leaf  cells  of  the  subdivision;  two  leaves  (nodes)  are  adja¬ 
cent  (share  an  edge)  if  and  only  if  there  is  a  portal  connecting 
them.  All  the  visibility  computations  to  be  described  exploit 
the  adjacency  graph  daUi  structure. 

Tliis  procedure  can  be  applied  quickly.  At  the  cost  of  per¬ 
forming  an  initial  0(n  Ig  n)  sort,  the  split  dimension  and  ab¬ 
scissa  can  be  determined  in  time  0{f)  at  each  split,  where  / 
is  the  number  of  faces  stored  with  the  node.  We  have  found 
that  these  subdivision  criteria  yield  a  tree  whose  cell  structure 
reflects  the  “rooms"  of  our  architectural  model.  For  our  floor 
model  with  1920  split  faces,  the  subdivision  created  1280 
cells  and  3600  portals  in  23  seconds. 

3.  Cell-to-Cell  Visibility 

Once  the  spatial  subdivision  has  been  constructed,  we  com¬ 
pute  and  store  cell-to  cell  visibility  for  each  leaf  cell,  i.e.  tlic 
.set  of  cells  visible  to  an  observer  able  to  look  in  all  direc¬ 
tions  from  any  position  within  the  cell.  The  cclI-to-cell  vis¬ 
ibility  for  a  cell  C  contains  exactly  those  cells  to  which  an 
unobstructed  sightline  leads  from  C.  Such  a  sightline  must 
be  disjoint  from  any  opaque  elements  and  must  intersect,  or 
stab,  a  portal  in  order  to  pass  from  one  cell  to  the  next  (Fig¬ 
ure  2).  Sighilincs  connecting  cells  that  arc  not  immedia'e 
neighbors  must  traverse  a  portal  sequence,  each  member  of 
which  lies  on  the  ooundary  of  an  intervening  cell.  We  have 
implemented  a  proc  dure  that  finds  sightlines  through  axul 
poruil  sequences,  or  determines  that  no  such  sightline  cxi.sts, 
in  0(rilg/i)  time,  where  n  is  die  number  of  portals  in  the 
sequence  [7]. 


Figure  2;  Stabbmg  an  axial  portal  sequence  in  three  dimen¬ 
sions. 

We  compute  the  cell  to-cell  visibility  by  constructing  a 
Slab  tree  for  each  leaf  cell  C  of  the  oubdiv  ision  [  1 1  ]  as  show  n 
in  Figure  3.  Each  node  of  the  stab  tree  corresponds  to  a  cell 


visible  from  C;  each  edge  of  the  stab  tree  conesponds  to  a 
portal  stabbed  as  part  of  a  portal  sequence  originating  on  a 
boundary  of  C.  The  stab  tree  is  constructed  incrementally  us¬ 
ing  a  constrained  depth-first  search  on  the  adjacency  graph 
As  each  cell  is  encountered  by  the  depth  first  search,  it  is 
effectively  marked  “visible”  by  its  inclusion  into  the  source 
cell’s  stab  tree.  For  any  source  cell  C,  we  say  that  a  cell  R  is 
reached  if  R  is  in  C’s  cell-io-cell  visibility  set. 


3.3  Cell-to-Object  Visibility 

Cells  that  are  immediate  neighbors  of  the  source  cell  are  en¬ 
tirely  visible  to  it,  since  the  eyepoint  can  be  placed  on  die 
shared  portal.  Cells  farther  away  from  the  source,  however, 
are  in  general  only  partially  visible  to  an  observer  in  the 
so,  cc  cell.  This  is  due  to  the  fact  that,  as  the  length  of  a 
portal  sequence  increases,  the  collection  of  lines  stabbing  the 
entire  sequence  typically  narrows. 

Casting  the  sighdine  search  as  a  graph  travc.'sal  yield ;  a 
simple  method  for  computing  the  partially  visible  portion  of 
each  reached  cell.  First,  the  traversal  orients  each  pc.tal  en¬ 
countered,  since  the  portal  is  traversed  in  a  known  uircction. 
Thus  each  portal  contributes  a  “lefthand”  and  a  “righthand" 
constraint  to  the  set  of  sightlines  stabbing  the  sequence.  Tltc 
result,  after  stepping  through  n  portals  in  the  plane,  is  a 
bowde-shaped  bundle  of  lines  that  stabs  every  portal  of  the 
sequence,  and  which  “fans  out"  beyond  the  final  portal  into 
an  infinite  wedge.  Tliis  wedge  can  then  be  clipped  to  the 
boundary  of  the  reached  cell.  In  our  three  dimensional  mod¬ 
els,  all  pormls  are  axial  rectangles,  so  any  portal  sequence 
can  generate  at  most  three  pairs  of  bowtic  constraints  (one 
from  each  colla  tion  of  portal  edges  parallel  to  die  x,  y,  and 
z  axes).  Color  Plate  11  dcpicLs  the  clipped  polyhedral  wedges 
for  a  source  cell  in  three  dimensions. 

Wc  define  cell-to-object  visibility  as  the  set  of  objects  diat 
can  be  seen  by  an  observer  constrained  to  a  given  source  cell 
C  (but,  again,  free  to  move  anywhere  in  C  and  look  in  any 
direction).  For  each  reached  cell  R,  we  compute  a  superset 
of  C’s  cell-to-object  visibility  in  U  by  assembling  a  set  of 
halfspaces  bounding  the  portion  of  R  visible  from  C.  Wc 
then  store  with  C  those  objects  in  R  that  arc  completely  or 
partially  inside  the  assembled  halfspaces.  One  special  case 
exists;  all  objects  in  C’s  neighbor  cells  arc  tagged  as  visible 
from  C  without  any  bowtie  compulations. 

Figure  5  depicts  diis  process  in  two  dimensions,  using  a 
simplified  lloorplan  of  our  three-dimensional  test  model.  Tlie 
objo-ts  found  potentially  visible  from  the  source  (the  tilled 
squares  in  Figure  5)  are  associated  with  the  source  cell  and 
reached  cell  in  a  compacted  representation  of  the  stab  tree. 
Later,  in  the  interactive  waikdirougli  phase,  this  objal  list 
will  be  retrieved  and  culled  dynamically  based  on  die  ob¬ 
server’s  position  and  view  direcuon. 
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Figure  3:  Ccll-to-ccll  visibility  and  stab  tree. 


4  The  Display  Database 

The  results  of  the  modeling  and  precomputation  phases  are 
stored  in  a  display  database  designed  specifically  to  identify 
and  swap  relevant  objects  into  memory  quickly  as  the  ob¬ 
server  moves  through  the  model  during  the  interactive  walk¬ 
through  phase.  The  structure  of  the  display  database  is  shown 
in  Figure  6. 
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Figure  6:  A  structural  diagram  of  the  display  database  show¬ 
ing  entities  (boxes)  and  relationships  (diamonds). 


Figure  4:  In  general,  only  a  fraction  of  the  reached  cell  is 
visible  to  the  source. 


Figure  5:  Computing  cell-to-object  visibility;  die  filled 
squares  are  marked  visible. 


4.1  Segments 

All  entities  (e.g.  cells,  portals,  objects,  etc.)  are  stored  in 
segments  in  the  display  database.  A  segment  is  simply  an 
abstraction  for  a  variable-sized  contiguous  group  of  bytes 
in  a  display  database  file  that  can  be  read  and  released  as 
a  unit.  Each  .segment  is  represented  by  its  size,  a  byte  offset 
into  a  file,  and  a  pointer  into  memory,  as  shown  in  Figure  7. 
The  arrangement  of  bytes  in  a  segment  is  identical  in  mem¬ 
ory  and  on  disk  so  that  only  pointers  within  a  segment  must 
be  updated  when  a  segment  is  read  (requiring  one  addition 
per  pointer);  there  is  no  need  to  allocate  extra  memory  or  to 
move  or  copy  bytes.  With  these  properties,  segments  can  be 
swapped  quickly  in  and  out  of  memory. 

All  relationships  (e.g.  adjacent,  incident,  visible,  etc.)  are 
stored  in  segment  references  in  die  display  database.  A  seg¬ 
ment  reference  can  be  represented  by  cither  an  integer  seg¬ 
ment  ID  (if  it  has  not  yet  been  read  into  memory)  or  a  pointer 
to  a  segment’s  uata  in  memory.  At  any  time,  a  segment  ref¬ 
erence  may  be  read  (converted  from  an  ID  to  a  pointer)  or  re¬ 
leased  (converted  from  a  pointer  to  an  ID).  A  reference  count 
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is  stored  with  each  segment  so  that  segments  can  be  read  and 
released  through  multiple  segment  references  quickly  and 
transparently. 
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Figure  7:  The  implementation  of  display  database  segments. 


4.2  Layout 

Since  the  latency  overhead  of  each  read  operation  is  rela¬ 
tively  large,  we  group  the  segments  for  all  objects  incident 
upon  the  same  ceil  contiguously  in  the  display  database  file. 
This  layout  allows  us  to  utilize  the  cell-to-cell  visibility  in¬ 
formation  from  the  precomputation  phase  to  load  groups  of 
objects  (those  likely  to  become  visible  at  the  same  time)  into 
memory  in  a  single  10  operation.  If  an  object  is  incident  upon 
more  than  one  cell  (i.e.  straddles  a  cell  boundary),  then  we 
store  it  redundantly,  once  for  each  cell. 

Furthermore,  we  store  descriptions  of  all  objects  incident 
upon  the  same  cell  at  the  same  level  of  detail  contiguously  in 
the  display  database,  as  shown  in  Figure  8.  Within  a  single 
cell,  the  object  headers  appear  first,  followed  by  descriptions 
of  the  objects  at  increasing  levels  of  detail.  As  a  result,  all 
objects  incident  upon  a  cell  at  or  up  to  any  level  of  detail 
may  be  read  at  once  in  a  single  read  operation  during  the 
interactive  walkthrough  phase. 


5  The  Walkthrough  Phase 

During  the  walkthrough  phase,  we  simulate  an  observer 
moving  through  the  arehitcttural  movlcl  under  user  conUol. 
The  goal  is  to  render  the  model  as  seen  from  the  observer’s 
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Figure  8:  The  layout  of  objects  incident  upon  the  same  cell 
in  the  display  database. 


viewpoint  in  a  window  on  the  workstation  display  at  interac¬ 
tive  frame  rates  as  the  user  moves  the  observer’s  viewpoint 
through  the  model. 

The  primary  problem  is  that  building  models  are  very  large 
and  so  1)  do  not  fit  into  memory,  and  2)  cannot  be  rendered 
completely  in  an  interactive  frame  time.  Thus  we  must  iden¬ 
tify  a  smtdl,  but  relevant,  portion  of  the  model  to  store  in 
memory  and  to  render  in  each  frame.  We  use  the  results  of  the 
visibility  precomputation  along  with  the  object  hierarchy  of 
the  display  database  and  dynamic  culling  algorithms  to  iden¬ 
tify  which  objects  are  visible  to  the  observer,  and  choose  an 
appropriate  level  of  detail  for  each  one.  We  load  into  mem¬ 
ory  and  render  only  relevant  levels  of  detail  for  potentially 
visible  objects. 

5.1  Display  Management 

We  use  two  techniques  to  reduce  the  amount  of  data  rendered 
in  each  frame:  1)  we  compute  the  subset  of  objects  visible  to 
the  observer  using  a  real-time  visibility  analysis  based  on  the 
results  of  the  precomputation  phase,  and  2)  we  choose  an  ap¬ 
propriate  level  of  detail  at  which  to  render  each  visible  object 
from  the  object  hierarchy  constructed  during  the  modeling 
phase.  Using  these  techniques,  we  are  able  to  cull  away  large 
portions  of  the  model  that  are  irrelevant  from  the  observer’s 
viewpoint,  and  therefore  achieve  much  shorter  refresh  times. 
Moreover,  computations  are  done  in  parallel  with  the  display 
of  the  previous  frame  and  do  not  increase  the  effective  frame 
time. 

Visibility  Analysis 

To  compute  die  set  of  objects  to  render  for  a  given  observer 
viewpoint,  we  first  identif^y  the  cell  containing  the  observer’s 
position  and  fetch  its  cell-to-object  visibility  from  the  display 
database.  Since  the  cell-to-object  visibility  contains  all  ob¬ 
jects  visible  from  any  viewpoint  in  a  given  cell,  it  is  always 
a  superset  of  the  objects  actually  visible  to  a  particular  ob 
server  in  that  cell.  Ii  is  typically  a  small  subset  of  the  entire 
model. 
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Since  the  observer  is  at  a  known  point  and  has  vision  lim¬ 
ited  to  a  view  cone  emanating  from  this  point,  we  can  cull 
the  set  of  visible  objects  even  further.  We  define  the  eye-to- 
cell  visibility  as  the  set  of  all  objects  incident  upon  any  cell 
partially  or  completely  visible  to  the  observer  (the  light  stip¬ 
pled  regions  in  Figure  9).  Clearly,  the  cye-to-cell  visibility 
is  also  a  superset  of  the  objects  actually  visible  to  the  ob¬ 
server.  The  visible  area  in  any  cell  is  always  the  intersection 
of  that  (convex)  cell  with  one  or  more  (convex)  wedges  em¬ 
anating  through  portals  from  the  eyepoint  To  compute  the 
eye-to-cell  visibility,  we  initialize  the  visible  area  wedge  to 
the  interior  of  the  view  cone,  and  the  eye-to-cell  visibility  to 
the  source  cell.  Next,  we  perform  a  constrained  depth-first- 
scarch  (DFS)  of  the  stab  tree,  starting  at  the  source  cell,  and 
propagating  outward.  Upon  encountering  a  portal,  the  wedge 
is  suitably  narrowed,  and  the  newly  reached  cell  is  added  to 
the  eye-to-cell  visibility  set.  If  the  wedge  is  disjoint  from  the 
portal,  the  active  branch  of  the  DFS  is  terminated. 

Finally,  we  estimate  the  eye-io-objecl  visibility,  a  nar¬ 
rower  superset  of  the  objects  actually  visible  to  the  observer, 
by  generating  the  intersection  of  the  ccll-to-objcct  and  eye- 
to-cell  sets.  For  example,  consider  the  observer  viewpoint 
shown  in  Figure  9.  The  eye-to-objcct  visibility  set  (filled 
squares)  contains  all  objects  in  tire  intersection  between  the 
cell-to-objcct  (all  squares)  and  eye-to-cell  (gray  regions) 
sets.  It  is  a  small  subset  of  all  objects  in  the  model,  but  stilt 
an  over-estimate  of  tlte  actual  visibility  of  the  observer.  In 
Figure  9,  only  one  square  lies  in  a  cell  visible  to  the  observer 
and  can  be  seen  from  some  point  inside  the  cell  containing 
the  observer,  but  is  not  visible  from  the  observer’s  current 
viewpoint.  Color  Plate  III  depicts  the  eye-to-objcct  visibility 
set  for  tliis  observer  viewpoint  in  three  dimensions. 


Figure  9:  Eyc-to-object  visibility.  Shown  are  only  the  po¬ 
tentially  visible  objects,  i.c.  the  black  objects  from  Figure  5. 


Object  Hierarchy 

After  we  have  culled  away  portions  of  the  model  that  are  in¬ 
visible  from  the  observer’s  viewpoint,  we  can  further  reduce 
the  number  of  faces  rendered  in  each  frame  by  choosing  an 
appropriate  level  of  detail  at  which  to  render  each  visible  ob¬ 
ject  Since  the  image  must  ultimately  be  displayed  in  pixels, 
it  is  useless  to  render  very  detailed  descriptions  of  objects  that 
are  very  small  or  far  away  from  the  observer  and  which  map 
to  just  a  few  pixels  on  the  display  (Figure  10).  Likewise,  it  is 
wasteful  to  render  details  in  objects  that  are  moving  quickly 
across  the  screen  and  which  appear  blurred  or  can  ^  seen 
for  only  a  short  amount  of  time  figure  1 1).  Instead,  we  can 
achieve  the  same  visual  effect  by  rendering  simpler  represen¬ 
tations  of  these  objects,  consisting  of  just  a  few  faces  with 
appropriate  colors.  This  is  a  technique  used  by  commercial 
flight  simulators,  however  little  has  been  published  on  the.se 
systems  [12]. 


Figure  10:  Perceptible  detail  is  related  to  apparent  size. 


*1  #2 


Figure  11:  Perceptible  detail  is  related  to  apparent  speed. 

Rather  tlian  rendering  all  objects  at  the  highest  level  of 
detail  in  every  frame,  we  choose  a  level  of  detail  at  which  to 
render  each  object  based  on  its  apparent  size  and  speed  from 
the  point  of  view  of  the  observer.  For  each  level  of  detail,  we 
estimate  die  size  of  an  average  face  in  pixels,  and  the  speed 
of  an  average  face  in  pixels  per  frame.  We  render  an  object 
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at  the  lowest  level  of  detail  for  which  the  average  size  of  a 
face  is  greater  than  some  threshold,  and  the  size  of  an  average 
face  divided  by  its  speed  is  greater  than  another  threshold.  If 
cither  of  these  values  is  less  than  the  corresponding  threshold 
for  all  available  levels  of  detail  of  an  object,  we  render  the 
object  at  its  lowest  level  of  detail. 

As  the  observer  moves  through  the  model,  an  object  may 
be  rendered  at  different  levels  of  detail  in  successive  frames. 
Rather  than  abruptly  snapping  from  one  level  of  detail  to  the 
next,  we  blend  successive  levels  of  detail  using  partial  trans¬ 
parency.  Since  the  complexity  of  any  level  is  typically  small 
compared  to  the  one  of  the  next  higher  higher  level  (by  more 
than  a  factor  of  two),  the  extra  time  spent  blending  the  two 
levels  during  transition  docs  not  constitute  an  undue  over¬ 
head,  considering  the  small  fraction  of  objects  making  a  tran¬ 
sition  at  the  same  time. 

5.2  Memory  Management 

Since  the  entire  model  cannot  be  stored  in  memory  at  once, 
we  must  choose  a  subset  of  objects  to  store  in  memory  for 
each  frame,  and  swap  objects  in  and  out  of  memory  in  real¬ 
time  as  the  observer  moves  through  the  model.  As  a  min¬ 
imum,  we  must  store  in  memory  all  objects  to  be  rendered 
in  the  next  frame.  However,  since  it  takes  a  relatively  large 
amount  of  time  to  swap  data  from  disk  into  memory,  we  must 
also  predict  which  objects  might  be  rendered  in  future  frames 
and  begin  swapping  them  into  memory  in  advance.  Other¬ 
wise,  fraitic  updates  might  be  delayed,  wailing  for  objects  to 
be  read  from  disk  before  they  can  be  rendered. 

As  described  in  Section  4.2,  we  group  each  level  of  detail 
for  all  objects  incident  upon  the  same  cell  contiguously  in  the 
display  dautbase.  To  take  advantage  of  die  relative  efficiency 
of  large  10  operations,  we  always  load  all  objects  incident 
upon  the  same  cell  into  memory  together  at  the  same  level  of 
detail.  Thus,  our  memory  management  algorithm  must  com¬ 
pute  for  each  frame  which  cell  contents  to  store  in  memory 
at  which  levels  of  detail. 

In  general,  we  store  in  memory  thic  contents  of  the  cells 
containing  the  objects  most  likely  to  be  rendered  in  upcom¬ 
ing  frames.  Specifically,  we  determine  which  cells  are  most 
likely  to  contain  the  observer  in  upcoming  Irames,  and  store 
in  memory  all  objects  incident  upon  cells  visible  from  any  of 
these  cells.  Each  time  the  observer  steps  across  a  cell  bound¬ 
ary,  we  traverse  the  cell  adjacency  graph,  considering  cells 
in  order  of  the  minimum  amount  of  time  before  the  cell  can 
possibly  contain  die  observer  using  a  shortest  path  algorithm. 
The  user  interface  also  enforces  some  limits  on  die  size  of 
a  step  or  turn  that  the  observer  may  take  in  a  single  frame. 
For  each  cell  C,  visited  in  the  search,  we  mark  and  claim 
memory  for  the  contents  of  all  cells  visible  from  C  in  the 
direction  of  die  observer’s  frustum  up  to  the  precomputed 
maximum  level  of  detail  at  which  any  object  incident  upon 
the  cell  might  be  rendered  for  an  observer  in  C.  Our  search 
terminates  when  all  available  memory  has  been  claimed  or 
when  we  have  considered  all  possible  observer  viewpoints 


more  than  some  maximum  amount  of  time  in  the  future.  We 
then  read  the  contents  of  all  newly  marked  cells  into  memory, 
possibly  replacing  the  contents  of  unmarked  cells. 

For  instance,  consider  the  observer  viewpoint  shown  in 
Figure  12.  Cells  are  labeled  by  the  minimum  amount  of  time 
(in  seconds)  before  they  can  possibly  become  visible  to  the 
observer,  and  shaded  by  the  level  at  which  their  contents  are 
stored  in  memory  -  darker  shades  represent  higher  levels. 
The  cells  surrounded  by  the  thick-dashed  line  represent  the 
cells  visited  during  the  search,  i.e.  the  range  of  observer  po¬ 
sitions  for  which  we  store  visible  objects  in  memory. 


Figure  12:  Cells  labeled  by  the  number  of  seconds  before 
they  can  possibly  become  visible  to  the  observer,  and  shaded 
by  level  of  detail  stored  in  memory  (a  darker  shade  repre¬ 
sents  “  ‘'ivher  level  of  detail).  White  cells  arc  not  loaded  into 
mcmcry. 


6  Results  and  Discussion 

In  this  section  we  present  and  analyze  test  results  collected 
during  real  interactive  walkthroughs  performed  with  our  sys¬ 
tem.  During  these  tests,  we  logged  statistics  regarding  the 
performance  of  our  display  and  memory  management  algo¬ 
rithms  in  real  time  as  a  user  walked  through  the  building 
model. 

We  present  results  for  one  observer  viewpoint  used  as  an 
example  in  the  previous  discussion  (marked  by  an  ‘A’  in  Fig¬ 
ure  13),  as  well  as  for  a  full  sequence  of  observer  viewpoints 
generated  during  an  actual  walkthrough  along  die  path  shown 
in  Figure  13).  The  path  is  about  300  feet  long,  and  a  real¬ 
istic  physical  walk  along  it  should  take  approximately  one 
minute.  All  tests  were  performed  on  a  VOX  320  Silicon 
Graphics  workstation  widi  two  33  MHz  processors  and  W 
MB  of  memory. 
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Figure  13:  Test  path  through  the  building  model. 


Display  Management 

As  discussed  in  Section  5. 1 ,  we  compute  the  set  of  potentially 
visible  objects  by  generating  successively  smaller  supersets, 
culling  away  objects  invisible  to  the  observer.  The  sizes  of 
these  sets,  and  the  times  (in  seconds)  required  to  render  them 
are  shown  for  viewpoint  ‘A’  in  Table  1  and  averaged  over  the 
test  walkthrough  path  in  Table  2.  On  average,  we  are  able  to 
cull  away  94%  of  the  model  and  reduce  rendering  time  by 
a  factor  of  17  by  rendering  only  objects  in  the  cyc-to-objcct 
visibility  set  rather  than  the  entire  building  model. 


Culling 

Method 

# 

Objs. 

# 

Faces 

Draw 

Time 

%of 

Model 

Entire  model 

2,320 

242,668 

3.77 

100% 

Ccll-to-cell 

1,065 

109,227 

1.77 

45% 

Ccll-to-objcct 

558 

40,475 

0.65 

17% 

Eyc-to-cell 

241 

30,265 

0.52 

12% 

Eyc-to-objcct 

165 

18,927 

0.33 

7.8% 

Table  1:  Visibility  cull  results  for  viewpoint  ‘A’. 


Culling 

Method 

1/ 

Objs. 

a 

Faces 

Draw 

Time 

%of 

Model 

Entire  model 

2.320 

242,668 

3.66 

100% 

Cell-to-ccll 

778 

78,475 

1.22 

32% 

Ccll-to-objcct 

440 

36,921 

0.59 

15% 

Eyc-to-ccll 

207 

20,657 

0.34 

8.5% 

Eyc-to-objcct 

141 

13,701 

0.23 

5.6% 

Table  2;  Average  visibility  cull  results  for  test  walkthrough. 

W'c  further  rcduec  the  number  of  fate^  rendered  at  each 
frame  by  ehuosmg  an  appropriate  le\cl  of  detail  at  which 


to  render  each  potentially  visible  object  based  on  its  appar¬ 
ent  size  and  speed  to  the  observer.  Statistics  regarding  the 
number  of  faces  and  the  time  required  to  render  each  frame 
using  different  pixels-per-face  thresholds  for  viewpoint  ‘A’ 
and  averaged  over  the  test  path  are  shown  in  Tables  3  and  4, 
respectively.  Usable  rendering  modes  for  which  little  or  no 
degradation  in  image  quality  is  perceptible  (>  256  pixels  per 
face),  are  shown  in  bold  typeface. 

Color  Plates  IV,  V  and  VI  show  the  difference  between  a 
static  image  produced  using  the  highest  level  of  detail  for  all 
objects  (Plate  IV)  and  one  generated  with  reduced  levels  of 
detail  for  obj  ects  with  fewer  than  256  pixels  per  face  (Plate 
V).  Plate  IV  has  23,468  faces  and  took  0.34  seconds  to  ren¬ 
der,  whereas  Plate  V  has  7,555  faces  and  look  0.17  seconds. 
These  images  were  rendered  without  interpolated  shading  or 
antialiasing  in  order  to  accentuate  differences  -  notice  the 
reduced  tessellation  of  the  chairs  further  from  the  observer. 
Plate  VI  shows  which  level  of  detail  was  used  for  each  object 
in  Plate  V  (a  darker  shade  represents  a  higher  level  of  detail). 

Overall,  after  computing  the  set  of  potentially  visible  ob¬ 
jects  and  choosing  an  appropriate  level  of  detail  for  each  ob¬ 
ject,  we  arc  able  to  cull  away  an  average  of  97%  of  the  build¬ 
ing  model  and  reduce  rendering  time  by  an  average  factor  of 
39  in  each  frame. 


Min.  Pixels 
Per  Face 

# 

Objs. 

# 

Faces 

Draw 

Time 

%of 

Model 

0 

165 

18,927 

0,33 

7.8% 

64 

165 

11,763 

0.26 

4.8% 

128 

165 

8,861 

0.22 

3.6  95> 

256 

165 

6,204 

0.17 

2.6% 

512 

165 

3,889 

0.13 

1.6% 

1024 

165 

2,871 

0.12 

1.2% 

Table  3:  Average  detail  cull  results  for  viewpoint  ‘.V. 


Min.  Pixels 
Per  Face 

a 

Objs. 

# 

Faces 

Draw 

Time 

%of 

Model 

0 

141 

13,701 

0.23 

5.6% 

64 

141 

9,700 

0.18 

4.0% 

128 

141 

7,979 

0.16 

3.3% 

256 

141 

6,176 

0.14 

2.5% 

512 

141 

4.745 

0.12 

2.0% 

1024 

141 

3,427 

0.10 

1.4% 

Table  4:  Average  demil  cull  results  for  test  walktlirough. 


Memory  Management 

As  described  in  Section  5.2,  tlic  memory  manager  tries  to 
store  in  memory  the  objects  incident  upon  tlie  cells  that  arc 


18 


most  likely  to  be  visible  to  the  observer  in  upcoming  frames 
in  order  of  decreasing  urgency.  One  of  the  two  processors  of 
the  VOX  is  used  for  pre-fetching  data  concurrently  with  the 
rendering  of  the  current  frame.  The  results  presented  here 
were  gathered  from  a  walk  along  the  test  path  shown  in  Fig¬ 
ure  13.  Since  the  current  floor  model  is  not  very  large  com¬ 
pared  to  the  memory  capacity  of  our  machine,  we  impose  an 
artificial  SMB  limit  on  the  amount  of  object  data  that  can  be 
stored  in  memory  at  any  one  time.  As  the  observer,  “walks" 
along  the  path,  we  swap  data  in  and  out  of  memory,  never  ex¬ 
ceeding  the  SMB  limit.  We  are  still  experimenting  with  tech¬ 
niques  to  control  the  interaction  between  our  memory  man¬ 
agement  algorithm  and  the  paging  of  the  operating  system. 
Thus  the  data  below  must  be  regarded  as  tentative  and  rather 
preliminary.  More  reliable  data  will  be  gathered  once  the 
fully  furnished  model  of  the  whole  building  becomes  avail¬ 
able. 

Figure  14  shows  a  plot  of  the  number  of  bytes  that  must 
be  in  memory  in  order  to  render  the  visible  parts  of  the  scene 
(lower  curve):  superimposed  is  a  plot  of  the  number  of  bytes 
our  algorithm  loads  into  memory  in  preparation  for  possible 
ncar-lcrm  observer  moves.  As  expected,  these  amounts  of 
(lata  fluctuate  strongly  depending  on  whether  the  observer  is 
in  a  relatively  simple  part  of  the  model  with  rather  confined 
views,  or  whether  tlte  visible  cells  stretch  out  to  great  depth 
along  several  directions.  In  all,  we  read  52MB  during  the 
261  frames. 


Figure  14;  Comparison  of  the  amounts  of  data  fetched  from 
disk  (top  curve)  and  actually  needed  for  rendering  (bottom 
curve)  while  following  the  walkthrough  test  patlr,  marked 
spots  correspond  to  the  labels  shown  in  Figure  13. 

In  general,  we  are  able  to  pre-fetch  objects  before  they  are 
rendered,  and  so  the  observer  can  move  smoothly  through  the 
model.  However,  there  are  a  few  cases  in  which  the  mem¬ 
ory  manager  is  not  able  to  predict  which  obja'ts  are  going 
to  become  visible  to  die  observer  far  enough  in  advance  to 
pre  feli.li  them,  and  so  the  user  ma>  have  to  wait  while  the> 
arc  read  into  memory.  As  the  observer  turns  a  comer  in  a 


corridor,  the  visible  set  of  objects  can  change  dramatically 
This  prompts  a  request  for  a  large  amount  of  new  data  to 
be  loaded  into  memory.  For  the  worst-case  comers  (labels 
‘B’  and  ’C’),  the  coprocessor  is  busy  for  about  8  seconds  to 
prefetch  on  the  order  of  2  MB  of  data  that  might  be  used  in 
the  near  future.  However,  the  amount  of  data  needed  imme¬ 
diately  for  the  rendering  of  the  next  frame  is  much  smaller; 
because  of  parallel  processing,  resulting  observable  delays 
are  on  the  order  of  a  couple  of  seconds  for  a  worst-case  sit¬ 
uation  in  our  model.  We  are  developing  more  sophisticated 
pre-fetching  techniques  that  use  a  better  prediction  of  the  ob¬ 
server’s  motion. 

7  Conclusion 

Our  paper  describes  a  system  for  interactive  walkthroughs 
of  very  large  architectural  models.  It  builds  a  hierarchical 
display  database  containing  objects  represented  at  multiple 
levels  of  detail  during  the  modeling  phase,  performs  a  spa¬ 
tial  subdivision  and  visibility  analysis  during  a  precomputa¬ 
tion  phase,  and  uses  real-time  display  and  memory  manage¬ 
ment  algorithms  during  a  walktlirough  phase  to  judiciously 
select  a  relevant  subset  of  data  for  rendering.  We  have  im¬ 
plemented  a  first  version  of  this  system,  and  tested  it  in  real 
walkthroughs  of  a  completely  furnished  model  of  the  sixth 
floor  of  the  planned  Computer  Science  building  at  UC  Berke¬ 
ley.  Our  initial  results  show  that  these  display  and  memory 
management  techniques  arc  effective  at  culling  away  su^ 
stantial  portions  of  tlic  model,  and  make  interactive  frame 
rates  possible  even  for  very  large  models. 
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Abstract 

This  paper  introduces  an  efficient  object-precision 
shadow  generation  algorithm  for  static  polygonal 
environments  directly  illuminated  by  convex  area  light 
sources.  Penumbra  and  umbra  regions  are  calculated 
analytically  and  represented  as  a  pair  of  BSP  trees  for  each 
light  source.  As  the  trees  are  built,  convex  scene  polygons  are 
filtered  down  the  trees,  and  split  into  fragments  that  are  wholly 
lit,  in  penumbra,  or  in  umbra.  The  illumination  due  to  the  light 
source  is  calculated  at  selected  points  within  the  wholly  lit  and 
penumbra  regions  by  contour  integration  with  the  visible  parts 
of  the  light  source.  We  use  a  fast  analytic  algorithm  to 
compute  the  fragments  of  the  area  light  source  visible  from  a 
point  in  penumbra.  Rendering  is  done  using  hardware- 
supported  linear  interpolated  shading  on  a  3D  grapliics 
workstation. 

Because  the  scene  itself  is  represented  as  a  BSP  tree, 
visible-surface  deiennination  may  be  performed  by  using 
either  workstation-supported  hardware  (e.g.,  a  z-buffer)  or 
software  BSP-tree  traversal.  We  provide  sample  images 
created  by  our  implementation,  including  timings  and  polygon 
counts. 

CR  Categories  and  Subject  Descriptors;  1.3.3  (Computer 
Graphics):  Picture/Image  Generation— D/sp/oy  algorithms', 
1.3.5  (Computer  Graphics):  Computational  Geometry  and 
Object  Modeling — Constructive  solid  geometry  (CSG)\  1.3.7 
(Computer  Graphics):  Tliree-Dimensional  Graphics  and 
Realism — Color,  shading,  shadowing,  and  texture 

General  Terms:  Algorithms 

Additional  Keywords  and  Phrases:  shadow  volume,  area 
light  source,  BSP  tree,  penumbra,  umbra 

Permission  to  copy  without  fee  all  or  part  of  this  materiel  is 
granted  provided  that  the  copies  are  not  made  or  distributed  for 
direct  commercial  advantage,  the  ACM  copyright  notice  and  the 
title  of  the  publication  and  its  date  appear,  and  notice  is  given 
that  copying  is  by  permission  of  the  Association  for  Computing 
Machinery.  To  copy  otherwise,  or  to  republish,  requires  a  fee 
and/or  specific  permission. 

e  1992  ACM  0-89791-471-6/92/0003/0021..  $1.50 


Introduction 

Shadow  generation  is  a  classic  problem  in  3D  computer 
graphics  that  has  been  addressed  by  a  wide  variety  of 
algorithms  ( 13, 22).  Point  light-source  shadow  algorithr.is 
essentially  compute  the  visibility  of  parts  of  the  environment 
from  a  point  at  the  light  source;  therefore  any  point  in  the 
environment  is  either  fully  in  or  out  of  shadow.  In  contrast,  in 
an  environment  lit  by  area  light  sources,  a  point  in  the 
environment  may  be  either  visible  to  the  entirety  of  the  light 
source,  visible  to  no  part  of  the  light  source  (i.e.,  in  the  light 
source’s  umbra),  or  visible  to  only  a  portion  of  the  light  source 
(i.e.,  in  the  light  source’s  penumbra).  In  this  latter  case,  to 
compute  the  point’s  illumination,  it  is  also  necessary  to 
detennine  which  portions  of  the  area  light  source  are  visible 
from  the  point.  Since  real  light  sources  are  not  points  and 
therefore  cast  both  umbrae  and  penumbrae,  an  area  light- 
source  shadow  algorithm  can  be  used  to  create  pictures  that 
are  more  photorealistic  in  appearance  than  those  created  svith  a 
point  light-source  shadow  algorithm. 

Shadows  from  area  light  sources  have  been  computed 
using  radiosity  approaches  (9, 6),  by  summing  the 
contributions  of  an  approximating  set  of  point  light  sources 
(5),  by  ray  tracing  shadow  cones  from  points  in  a  scene  to 
spherical  light  sources  ( I ),  by  distributed  ray-tracing  ( 10).  and 
by  an  object-precision  algorithm  developcHl  by  Nishita  and 
Nakamae  ( 17).  With  the  e.xception  of  this  single  object- 
precision  algorithm,  all  the  other  algorithms  approximate  the 
shadow  boundaries  on  the  objects  in  the  scene.  For  each  pair 
of  a  light  source  and  a  polyhc*dral  object,  Nishita  and  Nakamae 
compute  the  volume  that  the  object  fully  shadows  from  the 
light  source  (its  umbra  volume)  and  the  volume  that  the  object 
partially  shadows  from  the  light  source  (its  penumbra  volume). 
The  intersections  of  these  volumes  with  the  other  objects  in 
the  environment  are  computed  and  guide  the  calculation  of  the 
illuiniiu'.ion  at  ..elected  points  on  the  objects.  For  example,  a 
point  is  full)  shadowed  if  it  is  included  in  at  least  one  umbra 
volume. 

The  algorithm  that  we  describe  here  is  inspired  in  part  b) 
this  work;  unlike  Nishita  and  Nakamae.  however,  we  build  a 
single  merged  umbra  volume  and  penumbra  volume  for  each 
light  source.  Furthermore,  these  volumes  are  repres-.mted  as 
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BSP  trees  [14, 15, 21, 16]  using  an  efficient  extension  of  the 
earlier  BSP-tree-based  shadow  algorithm  for  point  light 
sources  (7).  Although  subdivision  is  always  done  along  exact 
shadow  boundaries,  further  subdivision  may  be  necessary  to 
compute  illumination  more  accurately.  We  have  used  both 
regular  gridding  and  adaptive  subdivision  of  fragments  in  the 
penumbra  and  wholly  lit  regions  to  compute  the  illumination 
at  additional  points. 

Background 

The  binary  space-partitioning  (BSP)  tree  visible-surface 
algorithm  was  developed  by  Fuchs,  Kedem,  and  Naylor  [  14). 
based  in  part  on  the  work  of  Schumacker  [19, 20).  A  BSP  tree 
defines  a  recursive  partitioning  of  space  by  planes  that  embed 
the  polygons  in  the  scene.  The  tree’s  root  is  a  polygon  chosen 
from  those  in  the  scene.  This  polygon’s  plane  partitions  space 
into  two  half-spaces:  the  “positive"  half-space  contains  all 
other  polygons  in  front  of  the  root’s  plane  (on  the  side  into 
which  its  normal  points);  the  “negative”  half-space  contains  all 
polygons  behind  the  root’s  plane.  If  a  polygon  straddles  the 
root’s  plane,  it  is  cut  by  it  and  each  of  its  pieces  is  assigned  to 
the  appropriate  half-space.  One  polygon  each  from  the 
positive  and  negative  half-spaces  are  then  selected  to  become 
the  root’s  children.  Each  child  is  then  recursively  used  to 
divide  the  remaining  children  in  its  half-space  in  the  same 
way.  The  tree  is  complete  when  each  leaf  node  contains  a 
single  polygon  whose  half-spaces  are  both  empty.  The  BSP 
tree  visible-surface  algoritlim  is  a  modified  inorder  traversal  of 
the  scene’s  BSP  tree,  guided  by  a  simple  comparison  of  the 
eyepoint  with  each  polygon’s  plane;  this  determines  in  0(/t) 
time  a  back-to-front  ordering  of  the  polygons  for  any  eyepoint. 

Thibault  and  Naylor  [2l|  showed  that  BSP  trees  can  be 
used  to  represent  polyhedral  solids.  Each  of  the  empty  regions 
at  the  leaves  is  associated  with  a  value  of  either  “in”  or  “out”. 
Assuming  that  each  polygon  that  bounds  a  polyhedron  has  a  ‘ 
normal  that  points  out  of  the  polyhedron,  then  an  “in”  region  is 
bounded  in  part  by  the  polygon’s  negative  (back)  half-space 
and  an  “out”  region  is  bounded  in  part  by  the  polygon’s 
positive  (front)  half-space.  The  BSP  tree’s  leaf  nodes 
lessellate  space  into  a  set  of  convex  polyhedral  regions,  a 
subset  of  which  (the  “in”  regions)  represent  the  solid. 

The  point  light-source  shadow  algorithm  described  in  [7, 
8]  uses  BSP  trees  to  model  the  polyhedral  shadow  volumes 
[11]  cast  by  convex  polygons.  We  call  the  BSP  tree 
representation  of  the  shadow  volume  the  SVBSP  (Shadow 
Volume  BSP)  tree.  A  regular  BSP  tree  is  first  constructed  fur 
all  polygons  in  the  scene.  (Note  that  if  the  scene  is  modified, 
then  the  scene  BSP  tree  must  be  recalculated.)  The  scene  BSP 
tree  allows  the  shadow  algorithm  to  obi.un  all  scene  polygons 
efficiently  in  front  to  back  order  relative  to  an  arbitrary  point 
light  source.  Only  scene  polygons  that  face  the  light  are 
selected.  The  point  light  source  and  the  first  scene  polygon 
chosen  define  together  a  shadow  volume  that  is  a  semi  infinite 
pyramid.  Each  of  the  pyramid’s  faces  is  embedded  in  a  plane 
defined  by  the  light  source  and  an  edge  of  the  scene  polygon. 

A  point  will  be  in  shadow  if  a  lies  within  the  pyramid  and  in 
the  scene  polygon’s  negative  half-space.  The  scene  polygon  is 
itself  fully  lit. 

Because  of  the  front  to  back  order  imposc'd  by  the  BSP 
tree  traversal,  each  new  scene  polygon  processed  is  guaranteed 


not  to  block  any  of  the  previously  selected  scene  polygons 
from  the  light.  It  may  be  wholly  or  partially  in  shadow  itself, 
however.  To  determine  which  parts  of  the  new  polygon  arc 
visible  from  the  light  source,  we  must  partition  the  polygon 
into  parts  that  arc  inside  and  outside  the  current  SVBSP-trec 
shadow  volume.  Note  that  there  is  no  need  to  compare  the 
new  polygon  with  the  planes  that  embed  the  previous  scene 
polygons,  since  the  BSP-tree  traversal  order  ensures  that  the 
new  polygon  does  not  lie  between  the  light  source  and  the 
preceding  scene  polygons.  Those  parts  of  the  new  polygon 
that  are  inside  the  shadow  volume  are  in  shadow;  those  parts 
that  arc  outside  it  are  lit.  Furthermore,  any  parts  that  are 
outside  define  additional  shadow  volumes  that  must  be  added 
te  the  SVBSP  tree.  The  point  light-source  algorithm 
efficiently  combines  these  two  steps  of  classifying  polygon 
ftagments  and  enlarging  the  SVBSP  tree  by  using  a  simplified 
version  of  the  Boolean  set  union  operation  algorithm  presented 
in  [21],  Each  remaining  polygon  is  processed  in  this  fashion 
to  determine  which  of  its  parts  are  shadowed. 

Like  the  BSP-tree  point  light-source  shadow  algorithm, 
our  BSP-tree  convex  .area  light-source  algorithm  supports 
multiple  light  sources.  The  area  light-source  algorithm 
extends  the  point  light-source  algoritlim  by  classifying 
polygons  into  fragments  that  are  wholly  lit,  in  penumbra 
(partially  blocked  from  the  light  source),  or  in  umbra  (wholly 
blocked  from  the  light  source).  To  do  this,  we  must  first 
define  the  umbra  and  penumbra  volumes  of  an  area  light 
source. 

Constructing  Penumbra  and  Umbra 
Volumes 

In  environments  composed  of  convex  polygons 
illuminated  by  convex  light  sources,  the  penumbra  and  umbra 
volumes  associated  with  a  single  scene  polygon  can  be 
constructed  entirely  from  three  kinds  of  planes: 

•  scene  polygon  planes,  a  single  one  of  which  is 
defined  by  the  scene  polygon  itself. 

•  lighhsonree  vertex  planes,  defined  by  a  vertex  of 
the  light  source  and  an  edge  of  a  scene  polygon, 
oriented  so  that  the  scene  polygon  is  entirely  in 
the  plane’s  negative  half-space  or  on  the  plane. 

•  light-soarce  edge  planes,  defined  by  an  edge  of 
the  light  source  and  a  vertex  of  a  scene  polygon, 
oriented  so  that  the  scene  polygon  is  entirely  in 
the  plane’s  negative  half  space  or  on  the  plane. 

We  u>e  Nishitd  and  Nukamae’s  criteria  for  determining 
tho^e  planes  that  define  the  peiiuiiibra  and  umbra  volume^  of  a 
:>ceiie  polygon.  The  penumbra  volume  i»  the  intersection  of 
the  scene  polygon’s  negative  half  space  .vilh  the  negative  half 
spaces  of  certain  light  source  verte.x  planes  and  light  source 
edge  planes.  These  light  source  vcrle.x  planes  and  ligl.t  source 
edge  planes  are  those  for  which  the  vertices  of  the  light  source 
ate  entirely  in  the  plane’s  positive  half  space  or  on  the  plane. 
(The  penumbra  volume  actually  encloses  points  in  umbra,  as 
well  as  those  in  penumbra.) 

Figure  1  shows  the  penumbra  cast  on  a  large  polygon  by  a 
triangle  light  source  illuminating  a  cjuadrilatcial  scene 
polygon.  Dashed  lines  passing  from  each  light  source  verte.v.  lo 
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Figure  !:  Penumbra  of  area  light  source,  with  light- 
source  vertex  planes  and  light-source  edge 
planes. 


Figure  2:  Penumbra  and  umbra,  \sith  light-souae  \cncx 
planes  and  light-souae  edge  planes. 


all  scene  polygon  vertices  define  the  light-source  vertex  planes 
and  light-source  edge  planes.  (The  additional  fragmentation 
surrounding  the  penumbra  outline  is  caused  by  the  algoritlun’s 
classification  process,  which  we  describe  later.)  Note  that  the 
planes  that  bound  the  penumbra  volume  are  those  that  have  the 
light  source  in  their  positive  half-space  and  the  scene  polygon 
in  their  negative  lialfispace.  Thus,  any  point  in  the  positive 
half-space  of  such  a  plane  cannot  be  blocked  from  any  part  of 
the  light  source  by  the  scene  polygon. 

The  umbra  volume,  wluth  is  oontained  entirely  within  the 
penumbra  volume,  is  the  interseetion  of  the  seene  polygon’s 
negative  half-spate  w  ith  the  negative  halt -spates  of  tertaiii 
light-source  vertex  planes.  These  light-source  vertex  planes 
are  those  for  wliith  the  vertites  of  the  light  sourte  are  entirely 
in  the  plane’s  negative  half-spate  or  on  the  plane.  No  light- 
source  edge  planes  contribute  to  the  umbra  volume. 


Figure  3:  Shadows  cast  by  ^  point  light  sources  at  the 
vertices  of  an  area  light  source. 


Figure  2  shows  the  same  scene  as  Figure  1  with  the  umbra 
included.  Note  that  the  dashed  lines  that  lie  in  the  planes  that 
define  the  urv^i  a  outline  do  not  always  pass  tlirough  the  umbra 
outline’s  vertices. 

Figure  3  shows  an  alternative,  but  exactly  equivalent,  way 
to  define  the  umbra  and  penumbra  volumes.  They  can  be 
derived  from  the  shadow  volumes  generated  when  the  convex 
scene  polygon  is  illuminated  by  point  light  sources  at  the 
convex  area  light  source’s  vertices,  (The  additional 
fragmentation  of  the  ground  plane  is  caused  by  the  BSF-tree 
poitit  light-source  shadow  algorithm  used  to  create  this  figure.) 
The  area  light  source’s  umbra  volume  contains  tho.se  points 
that  are  blocked  from  all  of  the  area  light  source’s  vertices. 
This  corresponds  to  the  intersection  of  the  point  light-source 
shadow  volumes,  which  is  defined  by  the  set  of  light-source 
vertex  planes  specified  previously. 

The  union  of  the  point  light-source  shadow  volutnes 
encloses  all  points  that  are  blocked  from  one  or  tnore  vertices 
of  the  area  light  source.  Tliis  is  only  a  subset  of  the  light 
source’s  penumbra  volume,  however,  since  it  does  not  include 
those  points  that  aie  visible  from  all  the  area  light  source’s 
vertices,  but  are  blocked  from  part  of  the  area  light  source’s 
interior,  it  can  be  shown  that  to  enclose  these  points  the 
penumbra  volume  must  be  the  convex  hull  of  the  point  light- 
source  shadow  volumes.  The  convex  hull  is  defined  by  the  set 
of  light-source  vertex  planes  and  light-source  edge  planes 
specified  previously. 

Overview 

Instead  of  the  single  SVBSF  tree  required  by  the  point 
light  source  shadow  algorithm,  sve  use  two  BSF  trees:  a 
penumbra  tree  and  an  umbra  tree  [S).  Each  BSF  tree  interna' 
node  is  defined  by  a  light  source  vertex  plane  or  light  source 
edge  plane. 

Much  like  the  point  light  source  shadow  aliorithm,  two 
steps  must  be  performed  for  each  scene  polygon: 
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•  Classifying  the  polygon  into  wholly  lit,  penumbra, 
and  umbra  fragments. 

•  Enlarging  the  penumbra  and  umbra  trees  with 
light-source  vertex  planes  and  light-source  edge 
planes  defined  by  the  polygon. 

The  classified  fragments  must  then  be  illuminated  and 
scan-converted. 


Algorithm 

Preprocess.  An  obvious  approach  to  classification  would 
be  to  compare  each  scene  polygon  with  the  shadow  volume  of 
every  other  scene  polygon.  However,  polygons  that  are  not  in 
the  same  half-space  of  a  polygon  as  the  light  source  cannot 
cast  shadows  on  that  polygon  or  any  other  polygon  in  the  light 
source’s  half-space.  Therefore,  as  in  the  point  light-source 
shadow  algorithm,  we  first  compute  a  BSP  tree  for  the  entire 
scene.  This  allows  us  to  perform  a  modified  inorder  traversal 
of  the  tree  to  process  scene  polygons  in  front-to-back  order 
relative  to  the  light  source. 

Unlike  a  point  light  source,  an  area  light  source  may  not 
lie  entirely  in  a  single  half-space  of  a  scene  polygon.  If  this 
occurs,  choosing  different  points  on  the  area  light  source  will 
generate  different  BSP-tree  traversal  orders.  To  obtain  a 
unique  order,  we  first  split  each  area  light  source  by  those 
scene  polygons  that  intersect  it  and  that  are  in  the  lit  half-space 
of  the  light  source’s  plane.  Since  each  of  the  resulting  light 
sources  is  wholly  on  one  side  of  each  scene  polygon,  any  point 
within  the  light  source  will  generate  the  same  front-to-back 
ordering  of  the  scene  polygons.  For  convenience,  we  pick  the 
centroid  of  each  resulting  area  light  source  as  the  point  fro’.n 
which  to  compute  the  ordering.  We  must  also  ensure  that  each 
scene  polygon  that  straddles  a  light  source  plane  is  split  by  the 
plane. 

Classification.  Classification  and  tree  enlargement  are 
interleaved  as  they  are  performed  incrementally  for  ea.h  scene 
polygon  in  front-to-back  order.  Therefore,  the  two  shadow 
trees  represent  the  merged  penumbra  and  umbra  volumes  of  all 
the  scene  polygons  processed  thus  far.  Classification  occurs 
by  filtering  each  polygon  down  one  or  both  shadow  trees. 

This  process  is  applied  recursively  until  all  of  a  polygon’s 
fragments  reach  the  "in”  and  "out”  leaves. 

A  polygon  is  first  lil'ered  down  the  penumbra  tree.  Any 
fragment  that  reaches  an  "out”  tell  is  marked  as  wholly  lit  and 
will  not  be  compared  with  the  umbra  tree.  (Recall  that  the 
umbra  volume  is  wholly  contained  within  the  penumbra 
volume,  so  any  fragment  outside  the  penumbra  volume  cannot 
be  in  umbra.)  Any  fragment  that  reaches  an  "in”  cell  is  at  least 
in  penumbra  and  may  be  in  umbra.  Each  such  fragment  must 
then  be  I'i’tered  down  the  umbra  tree.  Any  fragment  that 
reaches  an  umbra  tree  "out”  cell  is  in  penumbra,  whereas  any 
fragment  that  reaches  an  umbra  tree  "in”  cell  is  in  umbra.  The 
penumbra  and  umbra  BSP  trees  are  enlarged  by  unioning  them 
with  the  penumbra  volume  and  umbra  volume,  respectively, 
defined  by  the  full  sc  nc  polygon.  We  trivially  chassify  as  in 
umbra  any  polygon  that  is  in  the  back  half  space  of  a  light 
source,  without  any  need  for  Altering.  In  addition,  if  we 
assume  that  polygons  are  "  me-sided”  and  that  they  bound 
closed  polyhedra,  we  can  aisn  trivially  classify  as  in  umbra  all 
polygons  that  are  back-facing  relarive  to  the  light  source. 


As  in  the  earlier  point  light-source  algorithm,  multiple 
area  light  sources  are  supported  by  pipelining.  The  fragments 
classified  relative  to  one  light  source  must  be  used  as  input  to 
the  algorithm  when  processing  the  next  light  source.  Thus, 
when  all  light  sources  have  been  processed,  each  of  the  output 
fragments  is  uniquely  classified  relative  to  each  of  the  light 
sources.  (See  the  pseudocode  for  the  algorithm  in  the 
appendix.) 

Example.  Figure  4  shows  how  the  algorithm  handles  a 
simple  example.  For  ease  of  explanation,  the  figure  is  drawn 
in  2D  and  thus  shows  umbra  and  penumbra  areas  cast  by  a 
linear  light  source  on  lines  in  the  plane.  (In  2D,  only  light- 
source  vertex  edges  arc  needed,  but  the  definitions  are  the 
same  otherwise.) 

Initially,  both  shadow  trees  are  null  (“out”),  as  shown  in 
Figure  4(a).  Polygon  1  is  first  filtered  down  the  penumbra  tree 
and  is  trivially  classified  as  fully  lit.  Because  no  part  of  the 
polygon  was  classified  as  in  penumbra,  no  classification  is 
done  using  the  umbra  tree.  Next,  as  shown  in  Figure  4(b), 
polygon  I’s  penumbra  is  used  to  enlarge  the  penumbra  tree. 
Rather  than  using  the  many  lit  fragments  that  may  have  been 
identified,  the  original  polygon  is  used  instead.  In  2D,  this 
results  in  a  union  with  polygon  1  and  light-source  vertex 
planes  a  and  b.  wliich  define  polygon  1  ’s  penumbra  .lolume. 
Although  polygon  1  was  not  classified  using  the  umbra  tree,  it 
must  be  used  to  enlarge  the  umbra  tree  and  results  in  a  union 
with  volume  defined  by  polygon  1  and  the  light-source  planes 
M  and  V. 

Next,  polygon  2  is  classified,  !is  shown  in  Figure  4(c). 
Much  like  polygon  1,  polygon  2  is  classified  as  wholly  lit 
relative  to  the  penumbra  tree  and  is  not  classified  using  the 
umbra  tree.  The  penumbra  tree  is  then  enlarged  with  polygon 
2  and  planes  c  and  cl,  and  the  umbra  tree  is  enlarged  using 
polygon  2  and  planes  w  and  .x.  (Figure  4d).  Unlike  polygon  1, 
however,  polygon  2’s  addition  to  the  merged  umbra  volume  is 
not  semi-infinite. 

Polygon  3  is  more  interesting.  When  it  is  classified 
against  the  penumbra  tree,  as  shown  in  Figure  4(e),  it  is  split 
by  face  a  into  fragments  3.1  and  3.2.  Fragment  3.1  is 
classified  as  "out”  (i.c.,  wholly  lit),  while  fragment  3.2  is 
classified  as  "in”  (i.e.,  in  some  combination  of  penumbra  and 
umbra).  Therefore,  only  fragment  3.2  must  be  filtered  down 
the  umbra  tree.  When  this  is  accomplished,  the  umbra  tree’s  v 
plane  further  subdivides  fragment  3.2  into  fragments  3.2.1  (in 
penumbra)  and  3.2.2  (in  umbra).  At  this  point,  both  shadow 
trees  are  enlarged  using  the  original  polygon  3,  as  shown  in 
Figure  4(0.  This  results  (in  2D)  in  the  polygon  fragment  3. 1 
and  plane  e  being  added  to  the  penumbra  BSP  tree  and  a 
volume  defined  by  planes  y  and  z  and  3*,  the  fraction  of 
polygon  3  not  in  umbra,  being  added  to  the  umbra  BSP  tree. 

Illumination 

After  classifying  all  fragments  by  all  light  sources,  we 
need  to  illuminate  them.  We  use  an  analytic  direct  diffuse 
illuniinatioii  model  ( 17]  based  on  contour  integration,  which  is 
evaluated  at  polygon  vertices  within  the  penumbra  and  wholly 
lit  regions.  Unlike  full  global  illumination  algorithms, 
interrefleclions  are  not  computed.  Points  in  umbra  are  lit  by 
an  ambient  liglu  component  alone.  In  our  iniplemeiitation, 
interpolated  shading  is  performed  using  3D  graphics  hardware. 
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Figure  4:  Classifying  pulyguns  and  enlarging  the  penumbra  and  umbra  BSP  trees.  Parts  (a-f)  show  penumbra  and  umbra 
volumes  (areas)  and  their  trees  during  tlie  classincation  of  tliree  polygons  (lines). 


Although  the  classification  process  divides  polygons 
along  precise  shadow  boundaries,  large  polygons  may  remain 
that  are  homogeneously  lit  or  in  penumbra.  While  direct 
illumination  should  vary  continuously  across  these  surfaces, 
linear  interpolation  does  not  adequately  represent  these 
changes  and  does  not  allow  any  polygon  interior  pixel  tu  be 
bnghter  than  the  polygon’s  vertices.  Therefore,  illumination 
must  be  computed  at  additional  points  within  the  scone.  In  the 
pictures  included  here,  we  subdivide  wholly  lit  and  penumbra 
regions  using  regular  grids  of  user-specified  granularity.  We 
generally  use  a  finer  grid  in  the  penumbra  region,  since  the 
intensity  typically  changes  more  quickly  than  in  an  equivalent 
wholly  lit  region.  The  umbra  region  is  not  subdivided  because 
it  receives  only  constant  ambient  illumination.  Subdivision  is 
perfonned  after  classification,  since  it  has  no  effect  on  the 
precision  at  which  classification  occurs  and  would  increase  the 
classification  overhead  if  performed  first.  BSP-tree 
subdivision  can  often  generate  thin  sliver  polygons  that  can 
cause  shading  anomalies.  Better  results  would  be  obtained 


with  an  adaptive  subdivision  algorithm  that  attempted  to 
generate  well  shaped  fragments  from  these  potentially 
problematic  fragments  |3]. 

Diffttse  illtmuuiuun  etiiiutUm.  To  determine  the 
illumination  at  a  point  that  is  wholly  lit,  we  perform  contour 
integration  w  ith  the  light  source  from  the  point  being  lit,  as 
described  in  [17).  The  diffuse  illumination  at  point  p  due  to 
the  light-source  is  computed  as 

//>  =  oS“vCos(P,). 

”  >■=! 

where  is  the  light  source  intensity,  n  is  the  number  of 
vertices  of  the  light  source,  a^,  is  the  angle  between  the  vector 
from  p  to  light-source  v  ertex  v  and  the  vector  from  p  to  light- 
source  vertex  v+1,  and  jlj,  is  the  angle  between  the  plane 
defined  by  the  two  vectors  used  to  compute  a  and  the  plane  on 
wliich  p  lies.  (The  cosine  of  may  be  computed  as  the  dot 
product  of  the  normalized  suiface  normal  at  p  with  the  cross 
product  of  the  normalized  vectors  used  to  define  a^,.) 
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Figure  5:  Penumbra  volume  of  a  single  polygon. 


Analytic  visibdity  for  penumbra  vertices.  For  points  in 
penumbre,  we  mast  determine  the  fragments  of  the  light 
source  that  are  visible  from  the  point.  We  accomplish  this 
with  a  simplified  version  of  the  earlier  point  light-source 
shadow  algorithm.  By  traversing  the  scene  BSP  tree,  we  can 
obtain  all  polygons  between  the  point  whose  illumination  is 
being  computed  and  the  plane  of  the  light  source.  (Whether 
the  traversal  order  is  back-to-front  or  front-to-back  is 
unimportant.)  As  before,  we  consider  only  those  scene 
polygons  that  are  front-facing  relative  to  the  light  source  (i.e., 
back-facing  relative  to  the  point  being  illuminated). 

For  each  scene  polygon,  we  clip  the  light-source  polygon 
by  the  point  light-source  shadow  volume  defined  by  the  point 
in  penumbra  and  the  edges  of  the  scene  polygon.  The  portion 
of  the  light  source  that  is  inside  this  volume  is  discarded  and 
the  portions  that  are  outside  are  retained  for  comparison  with 
the  next  scene  polygon’s  volume.  (Since  the  original  light- 
source  polygon  bounds  any  light-source  fragments  produced, 
it  can  be  used  to  do  an  extent  check  if  desired.)  The  fragments 
remaining  when  the  BSP-tree  traversal  encounters  the  light- 
source  polygon  are  those  that  are  visible  from  the  point  in 
penumbra  and  we  sum  the  illumination  contributed  by  each 
light-source  fragment. 

Discussion  and  Implementation 

In  the  BSP-tree  point  light-source  algoritlun,  the  SVBSP 
tree  was  enlarged  to  reflect  a  polygon’s  contribution  to  the 
shadow  volume  by  using  a  simplified  version  of  the  set  union 
algorithm  described  in  (21].  This  simplification  ignored  any 
part  of  a  polygon  that  fell  within  the  existing  volume.  It  used 
only  planes  determined  by  those  fragments  of  the  polygon  that 
were  wholly  lit.  For  a  point  light  source,  t:ie  volume 
determined  by  these  planes  is  guaranteed  not  to  intersect  the 
existing  shadow  volume.  (In  other  words,  no  fragment  hi  by  a 
point  light  source  casts  a  shadow  that  falls  within  the  shadow 
cast  by  any  other  lit  fragment.)  Tins  is  not  the  case  for 
penumbra  volumes,  however.  The  penumbra  volume  cast  by 
one  polygon  may  intersect  the  volume  cast  by  another. 
Therefore,  a  regular  BSP-tree  set  union  operation  [21  ]  must  be 


Figure  6:  Incorrect  merged  penumbra  volume  of  two 
polygons. 


Figure  7:  Correct  merged  penumbra  volume  of  two 
polygons. 


performed. 

Figure  5  shows  the  penumbra  volume  defined  by  a  single 
scene  polygon.  Figure  6  shows  the  incorrect  results  that  occur 
if  a  second  scene  polygon  is  added  and  the  planes  defining  its 
penumbra  volume  are  not  continued  into  the  penumbra  volume 
of  the  original  polygon.  In  this  case,  the  penumbra  volume  of 
the  second  polygon  considered  by  itself  is  similar  to  that  of  the 
first  polygon  and  overlaps  the  first  polygon’s  penumbra 
volume.  This  new  penumbra  volume  crosses  over  the  leftmost 
light-source  vertex  plane  bounding  the  first  polygon’s 
penumbra  volume.  Part  of  the  second  polygon’s  contribution 
to  the  merged  penumbra  volume  is  ignored,  resulting  i,.  the 
penumbra  gap  shown  at  the  bottom  of  the  figure.  Figure  7 
shows  the  correct  merged  penumbra  volume  that  results  when 
the  original  volume  is  enlarged  properly  by  unioning  the 
second  polygon’s  penumbra  volume  with  the  current 
penumbra  BSP  tree,  taking  into  account  the  possibility  of 
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Figure  8:  Statistics  f  color  plates.  All  timings  are  given  in  elapsed  wall-clock  seconds  for  an  HP  9000  380  (22  MIPS,  2.6 
MFLOPS).  Input  polygon  count  takes  into  account  splits  caused  by  building  the  scene  BSP  tree.  Shadow  time  is 
the  time  to  classify  the  input  polygons.  Grid  time  is  the  time  to  subdivide  the  wholly  lit  and  penumbra  regions  to 
produce  the  output  polygons.  Illumination  time  is  the  time  to  determine  illumination  values  for  the  output  vertices. 
Actual  vertices  lists  the  numbers  of  wholly  lit,  penumbra,  and  umbra  vertices.  Illuminated  vertices  lists  the 
numbers  of  wholly  lit,  penumbra,  and  umbra  calculations  performed,  which  is  lower  than  the  actual  vertex  count 
because  of  vertex  sharing.  (Figures  5  and  6  have  one  illumination  time  for  each  light  source,  and  one  set  of  vertex 
statistics  for  each  light  source.  Note  that  the  sum  of  the  actual  wholly  lit.  penumbra,  and  umbra  vertices  is  the  same 
for  each  light  source  in  these  figures.) 


overlapping  volumes. 

If  the  penumbra  volume  were  incomplete,  fragments  that 
were  contained  in  the  volume’s  missing  parts  would  be 
marked  as  wholly  lit  and  would  be  inconcctly  illuminated. 
Therefore,  it  is  essential  that  the  entirety  of  the  actual 
penumbra  volume  be  represented.  In  contrast,  since  the  umbra 
volume  is  contained  within  the  penumbra  volume,  if  the  umbra 
volume  were  incomplete,  fragments  that  were  contained  in  the 
umbra  volume’s  missing  parts  would  be  marked  as  being  in 
penumbra.  Since  the  illumination  algorithm  correctly 
determines  that  these  fragments  are  wholly  blocked  from  the 
light,  they  will  be  correctly  (albeit  expensively)  illuminated. 

It  is  interesting  to  note  that  unioning  each  polygon’s 
umbra  volume  with  the  existing  umbra  volume  does  not  create 
the  complete  »et  of  all  points  that  are  fully  blocked  from  the 
light  sourcv.  Instead,  it  creates  the  set  of  all  points  />  such  that 
there  is  at  least  one  polygon  that  fully  blocks  p  from  the  light 
source.  That  is,  the  union  of  the  individual  polygon  umbra 
volumes  does  not  contain  those  points  that  arc  fully  blocked 
from  the  light  source  only  because  of  the  contributions  of 
multiple  blocking  polygons.  An  example  of  this  occurs  in 
Figure  4(0.  Points  in  the  gap  between  planes  v  and  )  at  the 
bottom  of  the  volume  are  fully  blocked  from  the  light  source 
because  of  the  combined  effect  of  polygons  I  and  3,  yet  do  not 
lie  in  the  merged  umbra  volume. 

As  with  roost  analytic  algorithms,  car.*  must  be  taken  to 
contend  with  finite  floating-point  precision.  To  avoid 
problems,  as  polygons  are  split,  the  plane  equations  are 
copied,  not  recomputed.  A  similar  method  can  be  used  to 
guarantee  that  split  edges  remain  truly  collinear.  When  a 
polygon  edge  is  split,  we  also  insert  the  new  vertex  in  any 
other  polygon  that  shares  the  edge.  This  prevents  the  shading 
discontinuities  that  would  be  caused  by  a ‘T”  vertex.  The 
vertex  at  which  a  split  occurs  is  also  shared  among  the 
polygon’s  fragments.  This  allows  each  vertex’s  illumination 
computation  to  be  performed  only  once.  It  also  makes  it  easy 
to  determine  the  kinds  of  fragments  that  share  a  given  vertex. 

If  a  vertex  is  shared  by  a  vs  holly  lit  fragment  and  a  penumbra 
fragment,  we  treat  the  veile’'  as  wholly  lit  fur  both,  eliminating 
the  need  for  the  ligh.-source  visibility  test.  If  a  vertex  is 
shared  by  a  wholly  lit  fragment  and  an  umbra  fragment,  it  is 
treated  differently  in  each  to  preserve  the  boundary.  We 


currently  do  not  promote  vertices  shared  by  both  penumbra 
and  umbra  fragments  to  umbra  vertices.  This  avoids  the 
possibility  of  smearing  a  full  umbra  shadow  into  a  penumbra 
fragment  when  the  umbra  fragment  is  blocked  by  an  object 
that  does  not  block  the  penumbra  fragment.  This  is  simihir  to 
the  problem  of  “light  leaks”  [6],  in  which  a  polygon  is 
straddled  by  a  partition  that  blocks  light  from  some  of  its 
vertices,  even  though  illumination  leaks  under  the  partition 
through  interpolated  shading. 

Another  possible  optimization  that  would  reduce 
fragmentation  is  to  merge  fragments  together  when  both 
subtrees  were  classified  as  “in”  or  or  as  “out”  18).  Since  a 
penumbra  volume  extends  infinitely  far  past  the  object  that 
casts  it,  we  have  also  considered  some  approaches  to 
restricting  its  extent,  similar  to  Bergeron’s  use  of  end  caps  on 
shadow  volumes  to  eliminate  the  need  to  perform  shadow 
computations  outside  of  a  light’s  “sphere  of  influence.”  14). 

The  area  light-source  algorithm  has  been  implemented  in 
C  on  an  HP  9000  380  TurboSRX  workstation,  and  the  results 
arc  displayed  interactively  using  hardware  interpolated 
shading.  Because  the  scene  polygons  are  represented  as  a  BSP 
tree,  either  the  hardware  z  buffer  or  a  software  BSP-tree 
visible  surface  algorithm  can  be  used  to  render  the  scene. 

Pictures.  Color  Plate  1  shows  two  objects  floating  m  air 
and  one  triangle  light  source  with  their  penumbra  and  umbra 
regions.  The  light  grey  and  dark  grey  fragments  are  in 
penumbra  and  umbra  respectively,  while  the  colored 
fragments  are  wholly  lit.  The  wholly  lit  and  penumbra 
fragments  have  been  gridded  after  classification.  Note  the 
band  of  penumbra  separating  the  umbra  regions  of  both 
objects.  As  described  above,  this  strip  should  be  in  umbra,  but 
will  be  properly  illuminated  because  the  illuniiiiatioii 
computation  determines  that  its  vertices  are  unlit.  The  same 
scene  after  illumination  and  interpolated  shading  is  show  ii  in 
Color  Plate  2. 

Color  Plate  3  shows  a  room  with  one  quadrilateral  area 
light  source  and  gray  fragments  to  represent  the  regions 
identified  as  being  in  penumbra  and  umbra.  Color  Plate  4 
shows  the  room  as  il  appears  after  illuiinnalioii  and  shading. 
Color  Plate  5  shows  a  different  v  lew  of  a  simpler  version  of 
the  room  without  the  playpen,  illuminated  by  two  quadiilaletal 
light  sources.  Color  Plate  6  shows  the  same  scene  as  Color 
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Figure  9:  Room  scene  classified,  2  lights. 


Plate  4,  illuminated  by  both  light  sources.  Figure  8  provides 
statistics  for  the  color  plates.  Figure  9  shows  the  room 
depicted  in  Color  Plate  6,  prior  to  illumination,  with  the 
fragments  produced  by  classific.atian  with  both  light  sources 
and  gridding. 

Note  that  the  most  expensive  part  of  the  algorithm  is  the 
illumination  phase,  which  need  not  be  accomplished  if  the  user 
is  interested  only  in  classifying  objects  according  to  their 
visibility,  which  is  necessary  in  a  number  of  applications  in 
areas  such  as  computer  vision  and  graphics  (12). 

Conclusions  and  Future  Work 

The  algorithm  described  here  analytically  generates 
penumbra  and  a  subset  of  the  umbra  for  static  convex 
polygonal  environments  illuminated  by  convex  area  light 
sources.  It  is  relatively  simple  to  implement,  places  no 
restrictions  on  the  location  of  objects  and  light  sources,  and 
runs  efficiently  for  small  scenes  on  modern  workstations  with 
hardware  3D  graphics  support.  To  generate  further  points  at 
which  illumination  is  sampled,  we  have  implemented  both 
regular  gridding  and  simple  adaptive  subdivision  of  those 
fragments  that  are  wholly  lit  or  in  penumbra. 

We  believe  that  an  efficient  analytic  shadow  algorithm 
would  be  useful  in  multiple  passes  of  a  radiosity  approach  (not 
just  for  the  initial  light-source  calculations,  as  implemented  in 
118)).  If  selected  radiators  were  treated  as  area  light  sources, 
object-precision  shadow  boundaries  could  be  determined, 
instead  of  the  relatively  coarse  boundaries  obtained  with 
current  adaptive  meshing  techniques.  This  may  make  it 
possible  to  create  more  accurate  images,  with  the  illumination 
contour  integral  used  to  calculate  analytu  form  factors  |2|  that 
properly  take  into  account  obstructions,  guided  by  the  shadow 
fi.e.,  visibility)  classification  phase. 
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Appendix:  Pseudocode 

procedure  generateShadows  (ALSIist,  BSPtree) 

for  each  node  n  in  BSPtree  ;  scene  BSP  tree 
copy  n.scenePolygon  into  n.fragmentList 
endfor 

for  each  als  in  ALSIist 
centroid  :=  centroid  of  als 

pBSP  :=  OUT_CELL ;  penumbra  BSP  tree 
uBSP  :=  OUT_CELL ;  umbra  BSP  tree 

for  each  node  n  in  BSPtree  in  front-to-back  order 
relative  to  centroid 

;  move  n.fragmentList  to  fragmentList 
;  so  that  n.fragmentList  can  be  recreated 
;  with  fully  classified  and  subdivided  fragments 
fragmentList n.fragmentList 
n.fragmentList  :=  NULL 

for  each  fragment  f  in  fragmentList 
if  f  not  facing  centroid  OR  als  not  facing  f 
markf  in  umbra 

n.fragmentList  :=  append(n.fragmentList,f) 
else 

;  split  f  into  wholly  lit  &  shadowed  fragments 
:  by  filtering  down  pBSP 

tempFragmentList  :=  NULL 
classifyWhollyLitOrShadowed 
(als, pBSP, f,&tempFragmentList) 

;  partition  shadowed  fragments  into  penumbra 
;  and  umbra 

for  each  fragment  t  in  tempFragmentList 
if  t  is  shadowed 
classifyPenumbraOrUmbra 
(als.uBSP.t.&n.fragmentList) 
else 

n.fragmentList  := 
append(n.fragmentList,t) 
endif 
endfor 

;  enlarge  pBSP  and  uBSP  trees 
pv  := 

constructPolygonPenumbra(als, n.scenePolygon) 
pBSP  :=  union(pBSP.pv) ;  see  [21] 
uv  := 

constructPolygonUmbra{als, n.scenePolygon) 
uBSP  :=  union(uBSP,uv) 
endif 

endfor ;  fragment 
endfor ;  node 

discard  pBSP  and  uBSP 

endfor ;  als 
endproc 
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procedure  classifyWhollyLitOrShadowed 
(als,pBSP,f,fragmentList) 

if  (pBSP  is  a  leaf) 
if  (pBSP  ==  OUT_CELL) 
mark  f  as  wholly  lit 
else 

mark  f  as  shadowed 
endif 

fragmentList :»  append{fragmentList,f) 
else 

splitPolygon(pBSP.plane,f,&negPart,&posPart) 
if  (negPart  !=  NULL) 

classifyWhollyLitOrShadowed(als,pBSP.negChild, 

negPart.&fragmentLIst) 

endif 

if  (posPart !-  NULL) 

classifyWhollyLitOrShadoweHfals.pBSP.posChild, 

posPart.&fragmentList) 

endif 

endif 

endproc 


procedure  classifyPenumbraOrUmbra 
(als,uBSP,f, fragmentList) 

if  (uBSP  is  a  leaf) 
if  (uBSP  ==  OUT_CELL) 
mark  f  as  penumbra 
else 

mark  f  as  umbra 
endif 

fragmentList  :=  append(fragmentList,f) 
else 

splitPolygon(uBSP.plane.f,&negPart,&posPart) 
if  (negPart  !=  NULL) 

classifyPenumbraOrUmbra(als,uBSP.negChild, 
negPart, AfragmentList) 
endif 

if  (posPart !-  NULL) 

classifyPenumbraOrUmbra(als,uBSP.posChild, 
posPart, AfragmentList) 
endif 
endif 

endproc 
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Abstract 

Designing  the  illumination  of  a  scene  is  a  difficult  task  be¬ 
cause  one  needs  to  render  the  whole  scene  in  order  to  look  at 
the  result.  Obtaining  the  correct  lighting  effects  may  require 
a  long  sequence  of  modeling/rendering  steps.  We  propose 
to  use  directly  the  highlights  and  shadows  in  (he  modeling 
process.  By  creating  and  altering  these  lighting  effects,  the 
lights  themselves  are  indirectly  modified.  We  believe  this 
new  technique  to  design  lighting  is  more  intuitive  and  can 
lead  to  a  reduction  of  the  number  of  modeling/rendering 
steps  required  to  obtain  the  desired  image. 


CR  Categories  and  Subject  Descriptors:  1.3.7  [Com¬ 
puter  Graphics]:  Three-Dimensional  Graphics  and  Realism. 
Interaction  techniques. 

General  Terras:  Algorithms. 

Additional  Key  Words  and  Phrases:  extended  light 
source,  shadow  volume,  soft  shadows,  hard  shadows,  inter¬ 
active  light  modeling. 

1  Introduction 

An  important  research  area  of  computer  graphics  consists  in 
simulating  realistic  pictures.  Reality  is  modeled  by  observ¬ 
ing  and  measuring  its  attributes.  In  a  next  step,  the  models 
are  rendered  onto  an  image.  In  that  sense,  computer  graph¬ 
ics  models  the  causes  and  renders  the  effects  onto  an  image. 
On  the  other  hand,  computer  vision  is  interested  in  analyz¬ 
ing  an  image.  It  tries  to  isolate  certain  effects  in  an  image  in 
order  to  identify  the  causes.  While  the  two  processes  might 
seem  to  go  on  totally  opposite  directions,  it  is  interesting  to 
consider  how  advances  in  one  direction  might  actually  help 
the  reverse  process. 

In  computer  vision,  highlight  information  has  been  used 
to  determine  light  direction  or  local  shape  orientation.  Babu 
et  al.  [babu85]  study  contours  of  constant  intensity  in  an  im- 
afac  to  determine  the  orientation  of  planar  surfaces  under  the 
illumination  of  a  directional  light  source.  Buchanan  [buchSTj 
fits  ellipses  to  the  highlights  to  obtain  the  same  information 
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for  planar  surfaces  illuminated  by  point  light  sources. 

One  important  aspect  of  most  of  these  algorithms  con- 
sbts  in  identifying  the  highlight  area.  This  is  not  an  easy 
task  as  many  of  the  algorithms  for  shape  from  shading  [horn88] 
require  almost  entirely  diffuse  surfaces. 

When  techniques  are  not  restricted  to  diffuse  surfaces, 
they  often  rely  on  some  kind  of  thresholding.  The  un¬ 
fortunate  reality  with  thresholding  is  that  different  values 
of  threshold  can  lead  to  relatively  different  shape  of  the 
highlight  and  therefore,  to  different  shape/light  recovery. 
Other  techniques  like  Wolff’s  use  of  polarization  [wolf91]  are 
promising  although  require  the  presence  of  polarizing  lenses 
on  the  cameras  capturing  the  scene. 

Much  useful  information  can  also  be  extracted  from  the 
shadow  areas  in  an  image  [walt75]  [shaf85).  These  areas  pro¬ 
vide  additional  information  on  the  shape  of  the  object  cast¬ 
ing  a  shadow  and  even  on  the  shape  of  the  object  on  which 
the  shadow  is  cast.  Moreover,  they  provide  information  on 
the  direction  and  the  shape  of  the  light  sources.  Unfortu¬ 
nately,  very  little  work  has  been  involved  in  recovering  the 
shape  of  an  extended  light  source,  as  recovering  shape  from 
shading  under  a  directional  or  a  point  light  source  is  already 
a  difficult  task. 

Shadows  are  not  easy  to  extract  from  an  image.  De¬ 
tecting  shadows  can  be  done  in  a  similar  way  than  edge 
detection  by  applying  various  edge  enhancing  filters.  For 
extended  lights,  the  shadow  edges  are  soft  and  the  shadow 
must  be  detected  based  on  changes  in  the  gradients  of  the 
shading.  Gershon  [gers87]  use  gradients  in  color  space  to 
determine  if  the  region  corresponds  to  a  shadow  region  or 
simply  to  a  change  of  material.  Textures  can  also  defeat 
most  of  the  techniques  and  must  be  carefully  handled. 

While  modeling  a  scene,  a  user  has  access  to  important 
information  unavailable  to  computer  vision,  i.e.  the  geome¬ 
try  of  the  scene  and  the  viewing  projection  parameters.  To 
better  understand  a  3D  scene,  the  user  can  therefore  move 
the  camera  around,  use  at  the  same  time  several  views  of  the 
same  scene,  move  objects,  remove  hidden  surfaces,  and  all 
of  this  in  real  time;  however,  so  far,  few  applications  use  in¬ 
formation  about  highlights  and  shadows  in  order  to  improve 
on  the  modeling  step  in  computer  graphics. 

This  paper  proposes  to  investigate  how  we  can  use  high¬ 
light  and  shadow  information  in  order  to  help  a  user  to  define 
the  shape  and  position  of  a  light  source.  It  does  not  pre¬ 
clude  the  previous  ways  of  defining  and  positioning  the  light 
sources,  but  enhances  the  whole  process. 
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2  Defining  and  Manipulating  Light  Sources 

With  the  advent  of  high  performance  graphics  hardware, 
it  becomes  possible  to  interactively  create  and  manipulate 
more  and  more  complex  models  with  a  higher  degree  of  re¬ 
alism.  Yesterday’s  simple  wireframe  models  can  now  be  re¬ 
placed  by  flat  shaded  polygons,  Gouraud  shaded  and  even 
Phong  shaded  polygons,  allowing  for  real  time  interaction 
with  the  models.  Hanrahan  and  Haeberli  [hanr90]  demon¬ 
strate  with  their  system  how  today’s  graphics  hardware  could 
be  used  to  “pmut”  textures  and  various  other  surface  param¬ 
eters  (transparency,  perturbation  of  surface  normals,  etc.) 
in  a  fully  interactive  system.  This  increase  in  rendering 
power  provides  us  with  the  possibility  to  investigate  light 
definition  and  manipulation  from  the  highlights  and  shad¬ 
ows  it  produces. 

2.1  Lights  from  Highlights 

In  this  section,  the  process  of  defining  a  light  from  its  high¬ 
lights  is  described.  Its  advantages  are  demonstrated  and  its 
restrictions  explained  so  one  could  better  understand  the 
implications  of  using  such  a  process. 

Highlights  are  usually  defined  in  the  reflection  models  by 
the  specular  term.  Consider  the  specular  term  of  Phong’s 
shading  [phon75]  as  expressed  by  Blinn  (hlin77]: 

(^•/?)"  (1) 

where  f}  is  the  surface  normal  at  a  given  point* 

fl  is  the  bisector  vector  of  the  eye  direction  and 
the  light  direction 

n  is  the  surface  roughness  coefficient. 

This  formulation  tells  us  that  for  a  given  point  on  the 
surface  specified  as  the  maximum  intensity  of  the  highlight, 
a  unique  directional  light  source  can  be  determined  as 

(2) 

where  B  is  the  eye  direction. 

The  term  maximum  intensity  is  not  properly  correct  if 
we  think  of  it  in  the  context  of  a  complete  shading  model. 
However  we  will  use  it  here  meaning  maximizing  equation 
(1).  It  is  interesting  to  note  that  other  points  on  the  surface 
might  reach  this  maximum  but  will  never  surpass  it. 

This  simple  relationship  between  the  maximum  inten¬ 
sity  of  the  highlight  and  light  direction  has  been  used  in 
the  past.  Hanrahan  and  Haeberli  [hanr90]  mention  how 
they  can  specify  a  light  direction  by  dragging  a  highlight 
on  a  sphere.  This  technique  has  also  been  previously  imple¬ 
mented  in  some  modelers  like  a  light  modeler  developed  in 
1983  at  NYIT  by  Paul  Heckbert  (manipulating  highlights 
on  a  sphere)  and  a  light  erlitor  written  by  Richard  Chuang 
around  1985  at  PDI,  which  was  used  among  others,  to  get 
highlights  to  appear  at  the  right  time  on  flying  logos.  It  also 
came  to  the  attention  of  the  authors  that  a  similar  approach 
to  Chuang’s  was  used  at  LucasPilm  to  get  the  glare  to  ap¬ 
pear  at  the  crucial  moment  on  a  sword  in  the  movie  Young 
Sherlock  Holmes. 

Our  technique  extends  the  basic  approach  in  the  above 
systems  by  indirectly  and  interactively  determining  the  sur¬ 
face  roughness  coefficient  n  in  relation  with  the  size  of  the 
highlight.  Here  is  how  it  works. 

*  All  vectors  in  tins  paper  are  assumed  normalized 


Once  the  maximum  intensity  point  of  the  highlight  has 
been  chosen,  the  user  drags  the  cursor  away  from  this  point. 
At  a  new  position  on  the  surface,  the  surface  normal  is  com¬ 
puted.  This  new  point  is  used  to  determine  the  boundary 
of  the  highlight,  i.e.  where  the  specular  term  of  (1)  reaches 
a  fixed  threshold  t.  To  satisfy  this  threshold,  n,  the  only 
unknown,  is  easily  computed  as 


\og{iJ  ■  il) 


(3) 


While  only  these  two  points  on  a  surface  are  necessary 
to  orient  a  directional  light  source  and  establish  the  surface 
roughness  coefficient,  they  give  almost  no  information  on  the 
shape  of  the  highlight.  To  approximate  the  contour  of  the 
highlight,  the  pixel  with  the  maximum  intensity  is  used  as  a 
seed  point  and  the  neighboring  pixels  covered  by  this  surface 
are  visited  in  a  boundary  fill  fashion  until  pixels  on  both 
sides  of  the  threshold  are  identified  or  until  the  boundary 
of  the  surface  is  found.  With  this  technique,  the  second 
point  might  not  appear  within  the  contour  of  the  highlight 
determined  from  the  seed  point.  If  this  happens,  the  second 
point  i-.  also  used  as  a  seed.  Unfortunately,  unless  each  pixel 
covered  by  this  surface  is  visited,  some  of  the  other  highlights 
produced  by  this  light  on  this  surface  might  be  missed.  If 
the  position  of  every  highlight  is  necessary,  the  whole  surface 
is  vkited  by  the  filling  algorithm  only  on  request  from  the 
user  because  such  a  request  can  lead  to  considerable  increase 
in  computation  time. 

When  n  has  already  been  determined  for  a  given  surface, 
care  must  be  taken  in  order  to  keep  a  unique  value  for  n.  If 
another  highlight  is  created  on  this  surface,  as  soon  as  the 
point  with  the  maximum  intensity  is  selected,  the  contour 
of  this  new  highlight  is  computed  with  the  previous  value 
for  n.  However  this  value  for  n  and  the  position  of  the  high¬ 
lights  are  not  fixed  and  can  be  interactively  changed  because 
some  information  is  kept  in  a  temporary  frame  buffer.  In  this 
frame  buffer,  each  previously  visited  pixel  contains  informa¬ 
tion  about  its  surface  normal.  The  contour  can  therefore  be 
scaled  down  (i.e.  a  smaller  highlight  but  a  larger  value  for 
n)  very  efficiently.  If  the  contour  is  increased,  only  the  un- 
visited  pixels  need  to  have  their  surface  normd  determined. 
Moving  the  contour  on  the  surface  is  also  possible  although 
more  expensive  if  the  highlight  is  moved  to  a  completely 
different  location  on  the  surface  as  many  surface  normals 
might  have  to  be  computed.  On  some  graphics  hardware 
like  the  VGX  from  SGI,  information  on  the  surface  normals 
can  be  obtained  directly  from  the  hardware  and  therefore  al¬ 
lows  for  even  faster  highlight  manipulation.  Figure  1  shows 
the  highlight  produced  by  a  directional  light  source  over  a 
patch  of  the  teapot.  The  white  segment  within  the  highlight 
region  represents  the  point  of  maximum  intensity  and  points 
towards  the  light  direction. 

Unfortunately,  highlight  information  is  dependent  on  the 
eye  position.  Therefr  ,  if  the  camera  is  moved,  every  high¬ 
light  in  the  scene  ■  be  recomputed.  Also,  the  points  of 
maximum  intensity  are  not  valid  any  more  and  consequently 
every  surface  has  to  be  scanned  to  recover  every  highlight, 
an  expensive  process  that  one  should  try  to  avoid  as  much  as 
possible.  This  means  also  that  a  highlight  computed  in  one 
window  would  have  a  diFerent  definiticn  in  another  window 
with  a  different  projection.  To  avoid  confusion  and  increas¬ 
ing  too  much  the  computing  time,  we  decided  to  remove 
every  highlight  information  when  the  viewing  parameters 
are  changed  although  we  keep  the  light  definitions.  These 
highlights  are  recomputed  on  request  from  the  user. 
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Figure  1;  Creating  a  light  by  its  highlight 


\ 


Figure  2:  Incomplete  highlight  information 


Another  limitation  of  using  highlight  information  to  de¬ 
scribe  a  light  source  resides  in  the  fact  that  a  highlight  spec¬ 
ifies  only  a  direction.  We  therefore  need  more  constraints 
to  use  it  to  determine  other  types  of  light  source.  Such  con¬ 
straints  exist  for  instance  for  polygonal  light  sources.  As¬ 
sume  a  plane  on  which  a  polygonal  light  resides.  By  adding 
a  highlight,  a  direction  is  established.  The  intersection  be¬ 
tween  this  direction  and  such  a  plane^  defines  a  point  light 
source,  a  vertex  of  a  linear  or  polygonal  light  source. 

To  represent  highlights  created  by  extended  light  sources, 
the  contribution  of  each  vertex  of  the  light  is  not  sufficient 
to  determine  the  shape  of  the  complete  highlight.  To  display 
this  information,  the  boundary  fill  algorithm  would  have  to 
compute  the  specular  integral  for  a  linear  light  [poul91]  or  a 
polygonal  light  [tana91]  for  each  pixel  to  visit.  Such  integrals 
are  rather  expensive  to  compute  and  in  order  to  achieve 
teal  time,  cheaper  approximations  based  on  precomputed 
tables  could  be  of  some  use  here.  We  did  not  investigate 
this  approach  in  the  context  of  this  paper,  relying  solely 
on  the  partial  information  provided  by  the  light  vertices  as 
shown  in  figure  2. 

As  it  can  be  observed,  highlight  infurmatiun  can  be  very 
useful  to  specify  directional  light  sources  and  surface  rough¬ 
ness  coefficients.  With  extra  constraints,  they  can  even  be 

^Note  that  there  might  not  be  any  intersection 


used  to  define  point,  linear  and  polygonal  light  sources  al¬ 
though  creating  an  arbitrary  plane  in  3D  is  not  necessarily 
an  easy  task.  Another  technique,  more  flexible  for  extended 
light  sources,  consists  in  using  the  shadow  information  to 
define  the  light  sources. 

2.2  Lights  from  Shadows 

Shadows  are  very  important  clues  to  help  understanding  the 
geometry  of  the  scene  and  the  interrelationship  between  ob¬ 
jects;  in  the  context  of  this  paper,  shadows  can  reveal  im¬ 
portant  information  about  the  nature  of  the  light  sources. 
We  will  define  light  sources  by  manipulating  their  shadow 
volumes.^  These  shadow  volumes  have  the  advantage  to  de¬ 
pend  only  on  the  lights  and  objects  positions.  Therefore,  as 
opposed  to  the  case  of  the  highlights,  the  camera  position 
can  be  changed  without  altering  their  description.  More¬ 
over,  their  definition  is  consistent  for  every  projection,  allow¬ 
ing  for  multiple  windows  open  with  different  orthographic 
and  perspective  projections  as  used  in  most  of  the  modeling 
systems. 

The  shadow  volume  created  by  an  object  illuminated 
by  a  directional  light  source  consists  of  a  sweep  of  the  ob¬ 
ject  silhouette  in  the  direction  the  light  source  shines.  This 
silhouette  can  be  analytically  determined  for  simple  prim¬ 
itives,  computed  for  moderately  complicated  objects  with 
algorithms  like  in  [bonfSC],  sampled  by  studying  the  varia¬ 
tion  of  surface  normals  at  the  vertices  of  a  tessellated  object 
or  sampled  using  the  information  in  a  z-buffer  projection  of 
this  object.  Specifying  the  direction  of  a  directional  light  is 
simply  a  question  of  choosing  two  arbitrary,  although  dif¬ 
ferent,  points  in  the  scene.  The  second  point  will  be  along 
the  shadow  cast  by  the  first  one.  To  move  this  shadow  vol¬ 
ume  once  defined,  one  needs  to  select  a  point  on  the  shadow 

sliaduw  vulunie  funned  by  a  single  ubjeet  and  a  directional  or 
a  point  liglit  is  the  3D  volume  within  which  every  point  is  in  shadow 
of  this  object  [crowVT]  (berg86]  For  extended  light  sources  (linear, 
polygonal),  the  shadow  volume  is  the  3D  volume  within  which  evi-j-j 
point  IS  at  least  partly  in  shadow  of  this  object 
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a) 


b) 


Figure  4:  Going  from  a  directional  light  source  to  a  point 
light  source 


Figure  5;  Umbra  region  in  hatched  undetected  in  the  pro¬ 
jection  domain 


volume.  The  point  on  the  object  casting  this  shadow  is 
then  identified.  By  dragging  the  cursor  to  a  new  location, 
a  new  direction  is  computed,  the  direction  of  a  directional 
light  source.  Figure  3  shows  a  cylinder  illuminated  by  a  di¬ 
rectional  light  source.  For  some  primitives,  computing  the 
exact  silhouette  can  be  expensive  and  not  carry  much  more 
information.  In  the  case  of  this  cylinder,  each  polygon  ver¬ 
tex  forming  the  cylinder  is  simply  projected  in  the  direction 
the  light  shines. 

A  directional  light  source  can  be  viewed  as  a  point  light 
source  at  infinity.  If  the  point  light  source  is  not  at  infin¬ 
ity,  the  silhouette  defining  the  shadow  volume  can  be  differ¬ 
ent  than  the  silhouette  defined  by  a  directional  light  source. 
Figure  4  illustrates  the  process  of  going  from  a  directional 
light  source  (figure  4a)  to  a  point  light  source  (figure  4b)  by 
modifying  its  shadow  volume. 

A  point  snt  on  the  shadow  volume  is  chosen.  The  point 
sn2  on  the  silhouette  casting  shadow  on  the  point  sni  is 
identified.  This  shadow  segment  sm  -  sri]  will  now  be  con¬ 
sidered  as  nailed  and  the  point  light  source  will  reside  on 
the  line  extending  this  segment.  By  selecting  another  point 
St  on  the  shadow  volume,  the  point  sj  casting  this  shadow 
on  this  point  is  identified.  The  nailed  segment  siii  —  snj 
and  the  point  sj  define  a  plane  (sni  -  snj  -  S2).  By  moving 
the  cursor,  a  point  sj  on  this  plane  is  located,  sj  now  is  on 
the  shadow  cast  by  sj.  The  point  light  source  is  therefore 
moved  to  p,  as  shown  in  figure  4b. 

Once  a  point  light  source  is  created,  it  can  be  manipu¬ 
lated  in  the  scene  by  manipulating  its  volume  shadow.  This 
can  be  done  by  fixing  any  shadow  segment  as  previously 
explained,  or,  if  no  shadow  segment  is  nailed,  by  adding  a 
new  constraint  to  the  system  by  assuming  for  instance  the 
distance  d  from  the  light  p,  to  the  poii>'  <ig  a  shadow 

is  constant.  Combinations  of  these  two  uc(.  ...  ate  sufficient 
to  position  almost  any  point  light  source  in  a  scene. 

In  some  rate  conhgurations  of  a  scene,  some  positions 
might  not  be  accessible.  For  instance,  assume  a  scene  is 
made  of  a  single  flat  polygon  and  of  a  directional  light  par¬ 
allel  to  the  plane  of  the  polygon.  In  such  a  situation,  the 
light  will  never  be  able  to  escape  the  plane  of  the  polygon. 
Fortunately,  this  situation  does  not  occur  often  in  general 
3D  scenes,  and  so  far  combinations  of  moving  the  shadow 
volumes  with  and  without  nailed  segments  proved  to  be  suf¬ 
ficient  to  position  our  lights. 

It  is  important  to  note  that  the  point  S2  might  not  lie 
on  the  boundary  of  the  shadow  volume  while  the  point  light 
source  is  moved  around.  However  the  real  shadow  volume  is 
always  displayed  so  the  user  has  a  direct  view  of  the  altered 
shadow. 


To  create  extended  light  sources  like  linear  or  polygonal, 
new  point  light  sources  are  needed,  defining  the  vertices  of 
the  light  source.  The  shadow  volumes  of  each  light  vertex  are 
handled  as  normal  point  light  sources  although  for  polygonal 
light  sources  with  more  than  three  vertices,  care  must  be 
taken  so  each  light  vertex  will  reside  on  the  light  plane. 

Shadows  of  extended  light  sources  are  formed  by  the  um¬ 
bra  and  penumbra  regions.  The  whole  shadow  region  is  de¬ 
fined  by  the  convolution  of  the  object  and  the  light  source 
in  the  projectio.i  domain  [guib83].  The  umbra  is  defined  by 
the  intersection  of  each  shadow  volume  (one  shadow  volum.* 
per  light  vertex);  the  penumbra  is  the  difference  between  the 
whole  shadow  and  the  umbra.  Nishita  et  al.  [nish83]  studied 
the  various  parts  of  these  shadow  regions  in  2D,  once  pro¬ 
jected  onto  polygonal  surfaces  for  shadow  culling  purposes. 
Some  problems  occur  when  neither  the  object  casting  the 
shadow  or  the  light  are  limited  to  being  convex.  It  can  be 
shown  however  that  if  both  the  light  and  the  object  are  di¬ 
vided  into  convex  elements,  the  whole  shadow  is  the  union 
in  3D  of  all  the  shadow  convex  hulls  as; 

For  each  convex  light  eleient 

For  each  convex  object  element 

Compute  the  convex  hull  of  the  shadoe 
volumes  created  by  these  teo  elements 
Compute  the  3D  union  of  all  these  convex  hulls 

For  now  on,  assume  a  polygonal  convex  light  and  a  con¬ 
vex  object. 

Assume  an  object  does  not  intersect  the  light  plane.  All 
the  shadows  lie  on  a  plane  parallel  to  the  light  plane  but 
located  at  infinity.  As  such,  2D  convex  hull  algorithms  can 
be  used  to  determine  which  part  of  the  shadow  volumes  form 
the  3D  convex  hull  of  the  shadow  volumes. 

However  computing  the  umbra  region,  i.e.  the  intersec¬ 
tion  of  the  convex  hulls  for  each  light  vertex  cannot  be  done 
in  2D.  Figure  5  shows  an  exemple  where  using  only  the  in¬ 
formation  in  the  2D  projection  plane  would  fail  to  identify 
the  umbra  region  showed  in  hatched. 

To  recover  the  umbra  region,  one  could  intersect  each 
shadow  polygon*  of  a  light  vertex  shadow  volume  with  each 
other  shadow  volume  of  the  other  vertices  of  a  single  light. 
This  process  can  be  very  expensive  as  it  is  0((ps)*)  where 
p  is  the  number  of  vertices  of  the  light  and  s  is  the  number 

*The  silhouette  uf  the  object  can  be  discretized  Each  point  cast 
its  bhadwn  in  ernt  «.onsct.utiV4;  piyunts  this  silhou¬ 

ette  and  thetr  shadow  direction  define  a  quadrilateral  with  two  of  its 
vertices  at  infinity 
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Figure  3;  Creating  a  directional  light  by  its  shadow 


of  shadow  polygons  forming  the  shadow  volume.  However 
some  improvements  can  be  obtained  by  first  projecting  the 
shadow  quadrilateral  onto  the  plane  containing  the  convex 
hull  on  the  2D  projection  plane. 

Since  we  use  Graham's  2D  convex  hull  algorithm  [sedg90], 
the  points  of  the  shadow  quadrilateral,  once  projected  in  2D, 
are  converted  in  pseudo  angles  and  an  efficient  combination 
of  angle  comparisons  and  boxing  allows  for  faster  intersec* 
tion  culling. 

This  process  could  also  be  improved  by  using  a  different 
data  structure  that  might  be  more  suitable  for  faster  inter* 
sections  of  half  planes  defined  by  the  shadow  quadrilater* 
als.  In  object  space,  the  binary  space  subdivision  algorithm 
handling  shadow  volumes  as  presented  by  Chin  and  Feiner 
[chin89]  would  be  a  good  candidate  to  investigate,  while  in 
sere  n  space  the  algorithm  described  by  Fournier  and  Fussell 
[fourSS]  could  be  of  use. 

3  Results 

A  very  simple  modeler  has  been  implemented  in  order  to 
test  the  techniques  presented  in  this  paper.  The  modeler 
includes  primitives  like  conics  (sphere,  disk,  cone,  cylinder), 
squares,  cubes,  triangular  meshes  and  Bezier  patches.  Fig¬ 
ure  6  shows  a  glooal  view  of  the  modeler  itself. 

The  coJi,  far  from  being  optimized,  is  written  under 
GL  and  was  developed  and  tested  on  an  Iris  4D/20  with 
z-buffer.  This  machine  handles  well  a  few  primitives  (a  10) 
but  as  the  scene  complexity  increases,  a  4D/240  VGX  be¬ 
comes  very  handy.  The  VGX  also  allows  for  real  time  Phong 
shading  which  b  very  useful  to  model  a  scene  and  when  ere* 
ating/manipulating  shadows,  but  it  can  lead  to  some  minor 
difficulties  when  creating  highlights,  because  the  threshold 
t  must  be  adjusted  to  the  SGl’s  Phong’s  shading  implemen¬ 
tation. 

Figures  7  to  9  show  a  cone  under  a  triangular  light  source. 
At  first,  no  convex  hull  b  applied.  In  this  image  (figure  7), 
it  b  easier  to  associate  each  shadow  with  a  light  vertex. 
Once  the  convex  hull  b  applied  (figure  8),  the  silhouette 


of  the  penumbra  is  easier  to  detect.  Notice  the  umbra  re¬ 
gion  just  under  the  cone,  within  the  penumbra  region.  In 
figure  9,  the  umbra  and  penumbra  volumes  are  filled  with 
a  semi-transparent  mask.  Thb  representation  gives  a  more 
complete  impression  of  the  shadows  that  can  not  really  be 
shown  here  with  a  single  image. 

4  Conclusion 

In  thb  paper,  we  investigated  using  lighting  effects,  i.e.  high¬ 
lights  and  shadows,  to  define  the  lights  themselves  and  spec¬ 
ify  their  location.  We  showed  some  inherent  limitations 
with  these  approaches  but  also  demonstrated  a  powerful 
new  technique.  Thb  technique  allows  a  user  to  interactively 
manipulates  highlights  and  shadows,  which  can  be  very  im¬ 
portant  when  designing  a  scene.  In  previous  modeling  sys¬ 
tems,  these  effects  were  too  often  neglected.  Therefore  a 
user  needed  to  iterate  between  rendering  the  whole  scene 
and  modifying  the  lights.  It  is  a  process  that  can  be  ex¬ 
pensive  depending  of  the  quality  of  the  rendering  required. 
Incorporating  highlights  and  shadows  in  the  modeling  pro¬ 
cess  adds  mote  information  on  the  geometry  of  the  scene 
and  its  illumination  which  should  help  the  user  to  under¬ 
stand  better  the  scene  before  even  rendering  it. 

Our  system,  although  simple,*  gives  during  the  modeling 
process  direct  information  to  the  user  on  the  lighting  effects 
since  these  effects  ate  the  objects  being  manipulated.  This 
direct  manipulation  is  crucial  as  getting  the  right  effect  by 
manipulating  the  causes  is  generally  more  difficult  than  ma¬ 
nipulating  the  effects  themselves. 

We  foresee  that,  as  the  graphics  hardware  improves  and 
as  the  CPU  becomes  faster,  more  and  more  effects  available 
once  only  at  the  rendering  stage  will  become  an  inherent 
part  of  the  modeling  stage  itself.  Real  time  Phong  shading  is 
now  becoming  common  with  high-end  modelers.  These  im¬ 
provements  will  lead  us  to  investigate  more  intuitive  ways  of 
defining  and  controlling  these  special  effects.  .Although  the 
separation  between  computer  graphics  and  computer  vbion 
is  still  strong,  we  believe  this  will  lead  us  to  more  and  more 
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Figure  9:  Cone  under  a  triangular  light:  Convex  hull  with  filled  shadows 
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graphics  in  vision  and  more  and  more  vision  in  graphics  for 
greater  benefits  to  realism  in  graphics  and  scene  analysis  of 
natural  phenomena  in  vision. 
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ABSTRACT 

The  effect  of  shadow  shaqiness  and  shadow  shape  on  the 
perception  of  spatial  relationships  was  studied  in  three  psy¬ 
chophysical  experiments.  In  each  experiment,  the  accuracy  with 
which  subjects  were  able  to  perform  spatial  estimation  tasks  was 
measured  while  cither  the  sharpness  or  shape  of  the  shadow  was 
varied. 

The  effects  of  shadow  sharpness  and  shadow  shape  on  the 
accuracy  of  sixe  and  position  estimations  were  tested  in  the  first 
and  second  experiments  respectively  using  fixed  scaling  tasks. 
Neither  variations  in  shadow  sharpness  or  shadow  shape  had  a 
significant  effect  on  the  accuracy  of  performance  in  the  experi¬ 
ments. 

The  third  experiment  tested  the  effect  shadow  sharpness  on 
the  accuracy  of  performance  in  a  shape  matching  task.  In  this 
experiment,  shadow  sharpness  had  a  significant  effect  on  the  ac- 
cursey  of  performance  with  soft  edged  shadows  significantly  re¬ 
ducing  the  number  of  correct  shape  matches. 

These  results  indicate  that  less  physically  accurate  hard 
edged  shadow  rendering  techniques  may  be  preferable  in  tasks 
requiring  accurate  perception  of  an  object's  shape. 

CR  categories  and  Subject  Descriptors: 

D.2.2  (Software  Engineering!:  Tools  and  Techniques  -  User  in 
terfaces; 
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information  processing; 

I. 3.6  (Computer  Graphics):  Methodology  •  interaction  tech¬ 
niques; 

1 3.7  (Computer  Graphics):  Three  dimensional  graphics  and 
realism  -  color,  shading,  shadowing,  and  texture. 
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1.  INTRODUCTION 

One  of  the  difficult  decisions  facing  the  designers  of  appli¬ 
cations  for  the  interactive  viewing  and  manipulation  of  virtual 
spaces  is  detemiining  the  combination  of  rendering  techniques 
to  use  for  the  generation  of  displays.  Each  rendering  tccimique 
provides  a  subset  of  the  perceptual  cues  used  in  determining 
spatial  relations.  The  job  of  the  designer  is  to  maximize  the  spa¬ 
tial  information  perceived  by  the  user,  without  exceeding  the 
computational  limitations  of  real-time  image  generation  in  the 
target  computing  environment. 

Wanger,  Ferwerda,  and  Greenberg  (6),(7)  ran  several  for¬ 
mal  f 'ychophysical  experiments  to  measure  the  relative  effects 
of  a  number  of  spatial  cues  on  the  performance  of  several  spatial 
manipulation  tasks  in  a  virtual  space.  One  result  of  their  ex¬ 
periments  was  that  shadows  had  a  significant  positive  effect  on 
the  performance  of  tasks  requiring  the  determination  of  an  ob¬ 
ject's  position  and  size. 

Although  these  results  indicate  that  shadows  are  a  powciful 
cue  for  detennining  spatial  relaiionshijis  in  many  tasks,  they  do 
nut  address  the  effect  of  the  quality  of  the  shadow  on  the  per¬ 
ception  of  the  space.  Since  accurate  shadow  generation  is  com¬ 
putationally  exjiensivc,  it  would  be  useful  to  understand  the  re¬ 
percussions  of  various  shadow  approximations  on  the  perception 
of  spatial  relationship  s.  Tliis  paper  describes  tluee  psychophysi¬ 
cal  experiments  conducted  to  measure  the  effect  of  shadow 
sharpness  and  shadow  shajie  on  the  jierception  of  spatial  rela 
tionships  in  static  computer  generated  images. 

2. THKK.\PKRIMKNTS 

Tliis  section  describes  the  exjictimciits  pttfoniied.  Spcxifn. 
details  on  the  methods  tised  for  each  of  the  ex|>crimcnts  can  be 
found  in  Apjiemlix  A. 

2.1  KXPKRl.MENT  1:  KFKFX.T  OF  .SHADOW  SHARIWKSS 
ON  THE  PERCEPTION  OF  OBJECT  .SIZE  AND  POSI- 
■JTON 

Tlie  first  experiment  tested  the  effect  of  shadow  sharjin*  ss 
on  the  perception  of  objt«.t  size  and  position  Citiniatlons  iu  a 
fixed  staling  task.  In  each  trial  subjects  ^scic  jircscuicd  uali  a 
display  of  a  virtual  room  on  a  monitor  (.Figure  1).  Four  blue 
lines  Weie  displayed  on  the  floor  of  the  room  to  pro-,  ide  a  sc  .tie 
for  object  depth  (two  at  the  front  and  two  at  the  rear  of  the 
room),  and  four  yellow  lines  were  dis()layed  on  the  back  wall  to 
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provide  a  unitlcss  scale  for  object  height.  Five  balls,  increasing 
linearly  in  radial  size  from  left  to  right,  were  displayed  on  the 
floor  near  the  front  of  the  room  to  provide  a  scale  for  object 
size,  additionally  a  sixth  ball,  the  test  ball,  was  suspended  in  the 
room.  In  each  trial  subjects  were  asked  to  type  the  answers  to 
the  following  tiirce  questions  on  a  keyboard: 

1.  Using  the  blue  lines  at  the  front  and  back  of  the 
floor  of  the  room  as  depths  of  0.0  and  10.0  respectively, 
what  is  the  depth  of  the  test  object? 

2.  Using  the  yellow  lines  at  the  bottom  and  top  of  the 
back  wall  of  the  room  as  heights  of  0.0  and  10.0  tespcc- 
tivcly,  what  is  the  height  of  the  test  object? 

3.  Using  tlie  right-most  object  in  tlie  line  of  objects  at 
the  front  of  the  room  as  a  size  of  1  and  the  left-most  as  a 
size  of  5,  what  is  tlto  size  of  the  test  object? 

In  addition  to  varying  the  size,  height,  and  depth  of  the  test 
ball,  one  of  the  following  three  shadow  sharpness  levels  was 
used  for  the  test  ball  in  each  trial  (Figure  2): 

1.  No  shadows  -  Tlie  test  object  did  not  cast  a 
shadow. 

2.  Hard  shadows  ■  Tlte  test  object  cast  a  shadow  wiili 
a  sharj)  boundiiry  (i.e.  no  [leniunbral  region). 

3.  Soft  shadows  •  Tlie  test  object  cast  a  complete 
shadow  with  boili  umbral  and  pcmunbral  regions  accu¬ 
rately  rendered. 

Twelve  subjects  were  each  run  tliiough  56  trials  represent¬ 
ing  one  trial  for  each  of  the  combinations  of  shadow  level,  test 
ball  size,  test  ball  depth,  and  test  ball  height. 

2.2  EXPERIMENT  2:  EFFECT  OF  SHADOW  SHAPE  ON 
THE  PERCEmON  OF  OHJECT  SIZE  AND  POSITION 

Tlie  second  experiment  tested  the  effect  of  shadow  shape 
on  the  perception  of  object  size  and  jwsition  in  a  fixed  scaling 
task.  Tlie  displays  for  F.xperiment  two  utilized  the  same  virtual 
room  as  Experiment  1 .  However,  barbells  were  used  instead  of 
balls  for  both  the  size  scale  objects  and  the  test  object  (Figure 
3).  For  each  trial  subjects  were  asked  the  same  ilirec  questions 
as  iliose  asked  in  Experiment  1. 

In  addition  to  varying  the  size,  height,  and  depth  of  the  test 
object,  one  of  the  following  lluee  shadow  shape  levels  was  used 
for  tlie  test  object  in  each  trial  (Figure  4)' 

1.  No  shadows  -  Tlie  test  object  did  not  east  a 
shadow. 

2.  Bounding  volume  shadows  -  Tlie  test  objan  cast  a 
shadow  based  on  the  its  rectangular  bounding  volimie. 

Tliis  produced  normal  looking  objects  with  "bo.\y"  shad¬ 
ows. 

3.  "True"  shadows  -  Tlie  test  object  cast  a  shadow 
based  on  its  actual  geometry  to  produce  properly  shaped 
shadows. 

Twelve  subjects  were  each  rim  tlirough  5  '.rials  represent 
ing  one  trial  for  each  of  the  eombmalions  of  s'  Jow  shape  level, 
lest  ball  size,  lest  ball  depili,  and  test  ball  heig  i. 


23  EXPERIMENT  3:  EFFECT  OF  SHADOW  SHARPNESS 
ON  THE  PERCEPTION  OF  OBJECT  SHAPE 

Experiment  three  tested  the  effect  of  shadow  sharpness  on 
the  perception  of  object  shape  in  a  shape  matching  task.  The 
displays  used  in  Experiment  3  consisted  of  two  windows  (Figure 
5).  In  the  lower  window  five  objects  of  revolution,  numbered 
from  1  to  5  from  left  to  right  respectively,  were  represented. 
Each  of  these  shapes  were  unique  in  both  their  shape  and  height, 
but  were  identical  as  viewed  from  the  base  along  their  major 
axis.  The  top  window  displayed  a  plane  with  the  test  object  sus¬ 
pended  above  it.  The  test  object  in  the  upper  window  was  one  of 
the  shapes  from  the  lower  window  rotated  such  that  only  its  base 
could  be  seen.  In  each  trial  subjects  were  asked  to  type  the  iden¬ 
tifying  number  of  the  sliape  from  the  lower  window  which  cor¬ 
responded  to  the  test  object  in  the  upper  window.  In  addition  to 
changing  the  light  position,  the  shape  of  the  test  object,  and  the 
elevation  of  the  lest  object  above  the  plane,  cither  the  hard  or 
soft  shadow  sharpness  levels  described  in  Experiment  1  was 
used  in  each  trial. 

Twelve  subjects  were  each  run  tlirough  90  trials  represent¬ 
ing  one  trial  for  each  of  tlie  combinations  of  sliadow  shaqmess 
level,  test  object  elevation,  test  object  sliape,  and  light  position. 
In  addition,  5  trials  without  shadows  were  added  as  a  control 
condition  to  verify  dial  die  test  object’s  sliadow  was  the  only  cue 
provided  reg.arding  the  test  objects  shape.  Tliis  brought  the  total 
to  95  trials  per  subject. 

3.  EXPERIMEN'l'AL  RESULTS 

This  section  describes  the  results  of  the  experiments  per- 
fonned.  Details  on  the  mediods  used  to  analyze  the  experiments 
can  be  found  in  Appendix  B.  A  detailed  listing  of  the  quantita¬ 
tive  results  of  the  analysis  can  be  found  in  [6]. 

3.1  EXPERIMENT  1 

Siaiisiical  analysis  of  the  results  of  Experimem  1  indicated 
that  the  size,  depth,  and  height  of  the  lest  ball  were  all  signifi¬ 
cant  factors  in  size,  depdi,  and  height  eslimatioii,  but  ilui 
shadow  sharpness  was  not  a  signifieanl  factor  in  any  of  the  three 
estimation  tasks.  Although  the  existence  of  shadow  greatly  in¬ 
creased  the  accuracy  of  size  and  (losition  (height  and  depth)  es¬ 
timations,  the  sharpness  of  the  shadow  did  not  have  a  statisti¬ 
cally  significant  effect  on  die  accuracy  of  the  estimations. 

A  small,  but  statistically,  significant  interaction 
(F(2.10)=4.296,  p=0.0353)*  between  shadow  sharimess  and  test 
object  elevation  was  seen  in  the  position  estimation  task.  When 
the  test  object  was  at  the  middle  elevation  (one  inch  above  the 
ground  plane)  hard  shadows  significantly  increased  the  accuracy 
of  positional  estimations.  TTie  reason  for  this  anomaly  is  tin 
known. 


*Tlie  F  statistic  is  a  measure  of  the  variation  in  a  set  of 
observations  due  to  a  pariieular  experiment,.!  factor.  The  p  value 
IS  the  probablit)  that  the  amount  of  variation  seen  for  b)  a  faelor 
in  die  data  could  have  arisen  merely  by  random  \  .triaiioii. 
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3.2  EXPERIMENT  2 

Similar  to  the  results  of  Experiment  1,  the  size,  depth,  and 
height  of  the  test  object  were  all  significant  factors  in  tl.o  size 
and  position  estimations  in  Experiment  2,  but  shadow  shape  did 
not  have  a  significant  effect  on  the  accuracy  of  cither  size  or 
position  estimations.  Altliough  shadow  shape  by  itself  was  not  a 
significant  factor,  higher  order  interactions  between  shadow 
shape  and  other  factors  (such  as  shadow  shape  by  light  position 
and  shadow  shape  by  test  object  elevation)  indicate  that  shadow 
shape  may  have  some  subtle  effect  on  the  perception  of  size  and 
position.  It  is  unclear  what  these  effects  may  be  as  post  lioc 
analysis  showed  no  clear  directional  trends  for  these  interac¬ 
tions. 

33  EXPERIMENT  3 

All  of  the  factors  tested  in  Experiment  3  had  a  significant 
effect  on  the  determination  of  object  shape  (object  shape,  light 
position,  object  elevation,  and  shadow  sharpness:  F(4,8)=4.829 
p=0.028.  F(2.10)=  10.603  p=0.003.  F(2.10)=32.440  p<0.001.  and 
F(1 ,1 1)= 16.234  p=0.002  respectively). 

Ihc  shape  matching  task  appears  to  be  dejrcndent  on  identi¬ 
fying  features  in  the  shape.  Since  each  of  tlie  factors  tested  had 
some  effect  on  the  prominence  of  the  features,  it  is  logical  that 
all  of  the  factors  were  significant. 

The  position  of  the  light  affected  the  shape  of  the  resulting 
shadow.  The  percentage  of  correct  responses  were  47.9%, 
78.4%,  and  73.6%  as  the  light  moved  from  the  front,  middle, 
and  back  light  positions  respectively.  The  increase  in  accuracy 
when  the  light  was  in  the  middle  and  back  positions  is  explained 
by  the  fact  that  the  light's  normal  came  close  to  being  peqKn- 
dicular  to  the  test  object's  major  axis  in  these  positions,  ’hiis  in¬ 
creased  shaiK  matching  accuracy  as  the  differences  in  the  shape 
of  shadow  contours  for  the  various  shapes  became  more  pro¬ 
nounced  as  the  light's  nonnal  moved  towards  being  ()erj)emhcu- 
lar  to  the  test  object's  major  axis. 

Tlie  elevation  of  the  object  above  the  ground  plane  also  af¬ 
fected  the  prominence  of  object  features  when  soft  shadows 
were  present.  As  the  test  object  moved  higher  above  the  plane, 
the  object's  shadow  l)ecame  more  diffuse  -  blurring  identifying 
features.  Tliis  explains  why  the  percentage  of  correct  sha|)e 
matches  decreased  from  74.2%,  to  70.1%,  lo  55.7%  as  the  ob 
jeet  moved  from  the  lowest,  to  the  highest  {wsition  resi)cctivel> . 
Tlie  percentage  of  correct  matches  stayed  well  above  chance 
(20%)  for  all  three  elevations  as  some  identifying  features,  such 
as  the  aspect  ratio  of  ihe  shadow,  were  visible  in  even  the  most 
diffuse  shadows.. 

Support  for  the  use  of  identifying  features  is  seen  m  the 
fact  that  incorrect  responses  were  not  distributed  unifonnly 
among  different  shape  pairs,  but  instead  were  concentrated  lie- 
iwcen  S[x;cific  pairs  of  shaiies.  Confusion  between  shaixts  I  and 
2  (the  ball  and  pear  -  figure  6)  accounted  for  18.6%  of  all  incor¬ 
rect  responses,  while  confusion  between  shapes  3  and  5  tihe  cup 
and  capsule)  accounted  for  65.9%  of  all  of  the  incorrect  re- 
siKinses.  In  both  of  these  cases  incorrect  matches  occurred  when 
the  identifying  feature  of  one  object  was  mimicked  by  the  other 
object.  For  example,  some  projections  of  the  flat  top  of  the  cup 


shape  produced  a  shadow  with  two  curved  end  -  much  like  the 
shadow  of  capsule  shape.  The  lack  of  identifying  features  was 
compunded  even  further  by  perspective  foreshortening  causing 
the  cup  to  often  be  mistaken  for  the  capsule  and  vice  versa. 

Perhaps  the  most  dramatic  result  is  the  fact  that  82.6%  of 
all  incorrect  matches  took  place  in  trials  where  soft  shadows 
were  present.  It  is  clear  from  this  result  that  soft  shadows  can  be 
detrimental  to  determining  an  object's  shape  in  the  absence  of 
other  cues. 

4.  CONCLUSION 

Although  it  is  likely  that  the  patterns  seen  in  these  experi¬ 
ments  would  be  present  in  many  other  situations,  one  must  be 
careful  in  extrapolating  the  results  of  perceptual  experiments 
such  as  those  represented  here.  In  order  to  allow  these  experi¬ 
ments  to  be  accurately  controlled  and  measured  they  were  nec¬ 
essarily  contrived.  Assuming  that  these  results  are  applicable  to 
other  situations  the  following  conclusions  can  be  reached: 

1.  Tliese  cxiicrimenis  supjxwt  the  earlier  result  that 
shadows  are  indeed  a  useful  cue  for  indicating  the  size  and 
jxisition  of  objects.  In  addition,  shadows  cim  lie  a  jiowerful 
cue  for  indicating  an  object's  three  dimensional  sliajx:. 

2.  It  ap|x:iirs  tliat  tlie  sliiujiness  of  a  shadow  docs  nut 
have  any  ap|)rcciable  effect  in  t.'isks  b:Lsed  on  the  (vreej)- 
tion  of  the  si/e  imd  jxisition  of  an  object,  however,  soft 
shadows  c;m  have  a  strong  negative  effect  in  tasks  requir¬ 
ing  accurate  jierception  of  object  sliaix;. 

3.  Altliough  the  sliaixs  of  a  shadow  liiis  no  apjirecia- 
ble  effect  on  the  ix'reeption  of  object  size  and  |x>sition, 
higher  order  interactions  indicate  th.it  it  c.uiiiot  be  com¬ 
pletely  ignored. 

Tliese  results  indicate  that  in  many  eases,  eomputationally 
chea|x.‘r  hard  shadow  generation  teclinu|ues  are  adeiiuaie  and  in 
fact  may  actually  be  more  lienefieial  than  more  expensive  soft 
shadow  techniiiues. 
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APPENDIX  A:  EXPERIMEN  FAL  SETUP 

The  same  twelve  subjects,  six  men  and  six  women,  partici¬ 
pated  in  each  of  the  three  experiments.  Six  of  the  subjects  were 
experienced  in  using  three  dimensional  computer  graphics  and 
six  were  not.  Tlie  order  of  presentation  of  the  three  experiments 
was  varied  antong  the  subjects  to  eliminate  ordering  bias.  All 
subjects  were  either  graduate  students  or  faculty  members  at 
Cornell  University  and  had  normal  or  corrected  to  normal  vi¬ 
sion. 


object  to  render  properly  shaped  shadows,  and  were  intersected 
with  the  test  object's  bounding  box  to  create  bounding  volume 
shadows.  Images  without  shadows  were  produced  by  ignoring 
shadow  rays  which  only  intersected  the  test  object. 

A  more  rigorous  description  of  the  experimental  setup  can 
be  found  in  [6]. 

APPENDIX  B:  STATISTICAL  METHODS 

Results  were  analyzed  using  a  multivariate  analysis  of  vari¬ 
ance  (MANOVA),  with  a  significance  level  cut-off  of  p<0.05. 
Post  hoc  tests  for  the  direction  of  effects  were  performed  with 
two  tailed  matched  pairs  T-tests. 

In  all  three  experiments  trials  without  shadows  were  treated 
as  control  conditions  and  were  left  out  of  the  final  analysis. 
Subjects  performed  at  chance  on  non-shadow  trials  in  all  tlircc 
experiments  verifying  that  subjects  were  making  tlicir  spatial 
estimations  solely  on  the  basis  of  the  shadow  information. 

Detailed  descriptions  of  the  methods  used  can  be  found  in 

(6). 


Displays  were  prc-computcd  using  stochastic  ray  tracing, 
and  were  displayed  on  a  HP  98752A  19  inch  color  monitor  un¬ 
der  controlled  lighting  conditions.  Color  and  brightness  were  set 
by  the  experimenter  and  held  constant  for  all  trials. 

Displays  were  rendered  to  correspond  to  the  physical  area 
of  a  6  inch  by  6  inch  window  located  in  the  center  of  the  moni¬ 
tor.  The  window  subtended  9.5  degrees  of  visual  angle  both 
horizontally  and  vertically.  The  camera  was  set  to  be  coincident 
with  the  eye  point  of  the  subjects  looking  towards  the  center  of 
the  virtual  room,  with  the  frustum  of  view  corresponding  to  the 
physical  space  of  the  monitor,  and  projrcr  perspective  projection 
for  a  viewing  distance  of  18  inches.  Tlic  scene  was  illuminated 
by  a  white  ,ambient  light  source  and  a  2.5  inch  by  2.5  inch,  uni¬ 
formly  distributed,  white,  area  light  source  with  its  nonnal 
parallel  to  the  view  vector. 

In  Experiments  1  and  3,  shadow  sharpness  levels  were  cre¬ 
ated  by  varying  the  number  of  sample  points  used  for  the  light 
source  for  shadow  ray  light  intersection  testing.  Hard  edged 
shadows  were  rendered  using  a  single  sample  point  located  at 
the  center  of  the  area  light.  Soft  edged  shadows  were  rendered 
using  a  jittered  3x3  grid  of  sample  {xrinis.  Each  grid  was  tlien 
santpled  for  light  visibility  testing.  Images  without  shadows 
were  produced  by  ignoring  shadow  rays  which  only  intersected 
the  test  object. 

In  Experiment  2  shadow  shape  levels  were  created  by 
varying  the  geometry  of  the  object  used  to  test  object  shadow 
ray  intersections.  Shadow  rays  were  intersected  with  the  test 
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Abstract 

An  optoelectronic  head-tracking  system  for  head-mounted 
displays  is  described.  The  system  features  a  scalable  work  area 
that  currently  measures  10'  x  12',  a  measurement  update  rate  of 
20-100  Hz  with  20-60  ms  of  delay,  and  a  resolution 
specification  of  2  mm  and  0.2  degrees.  The  sensors  consist  of 
four  head-mounted  imaging  devices  that  view  infrared  light- 
emitting  diodes  (LEDs)  mounted  in  a  10'  x  12'  grid  of  modular  2' 
X  2'  suspended  ceiling  panels.  Photogrammetric  techniques 
allow  the  head's  location  to  be  expressed  as  a  function  of  the 
known  LED  positions  and  their  projected  images  on  the 
sensors.  The  work  area  is  scaled  by  simply  adding  panels  to 
the  ceiling's  grid.  Discontinuities  that  occuned  when  changing 
working  sets  of  LEDs  were  reduced  by  carefully  managing  all 
error  sources,  including  LED  placement  tolerances,  and  by 
adopting  an  overdetermined  mathematical  model  for  the 
computation  of  head  position:  sp?  .e  resection  by  collincarity. 
The  working  system  was  demonstrated  in  the  Tomorrow's 
Realities  gallery  at  the  ACM  SIGGRAPH  '91  conference. 

CR  categories  and  subject  descriptors:  1.3.1 
(Computer  Graphics];  Hardware  Architecture  -  three- 
dimensional  displays’,  1.3.7  (Computer  Graphics):  Three- 
Dimensional  Graphics  and  Realism  •  Virtual  Reality 

Additional  Key  Words  and  Phrases:  Head-mounted 
displays,  head  tracking 


1  Introduction 

it  is  generally  accepted  that  deficiencies  in  accuracy, 
resolution,  update  rale,  and  lag  in  the  measurement  of  head 
position  can  adversely  affect  the  overall  performance  of  a  HMD 
(17j(24J(25].  Our  experience  suggests  that  an  additional 
specification  requires  more  emphasis;  range. 
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Figure  I :  The  existing  system  in  UNO's  graphics  laboratory 


Most  existing  HMD  trackers  were  built  to  support  situations 
that  do  not  require  long-range  tracking,  such  as  cockpit-like 
enviromncnis  where  die  user  is  confined  to  a  seat  and  the  range 
of  head  motion  is  limited.  But  many  virtual  worlds 
applications,  such  as  architectural  walktliroughs,  would  benefit 
from  more  freedom  of  movement  (Figure  2).  Long-range 
trackers  would  allow  greatei  area:,  to  be  explored  naturally,  on 
foot,  reducing  the  need  to  resort  to  techniques  such  as  flying  or 
walking  on  treadmills. 

Such  techniques  of  extending  range  work  adequately  with 
closed-view  HMDs  that  completely  obscure  reality.  With  see- 
through  HMDs  (9j(ll],  however,  the  user's  visual  connection 
with  reality  is  intact  and  hybrid  applications  are  possible 
where  physical  objects  and  computer-generated  images  coexist. 
In  this  situation.  Dying  though  the  model  is  meaningless.  Tlie 
model  is  registered  to  the  physical  world  and  one's  relationship 
to  both  must  change  simultaneously. 

This  paper  describes  the  second  generation  of  an 
optoelectronic  head  tracking  concept  developed  at  the 
University  of  North  Carolina  at  Chajiel  Hill.  In  the  concept's 
first  generation,  the  fundamental  design  parameters  were 
explored  and  a  bench  top  prototype  was  constructed  (28). 
Building  on  this  success,  the  second  generation  tracker  is  a 
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fully  functiontl  prototype  that  significantly  extends  the 
woricspace  of  an  HMD  wearer. 


Figun  2:  Walkthrough  of  Brooks'  kitchen  design  that  runs  with 
the  tracker.  Actual  resolution  if  images  seen  in  the  HMD  is 
much  lower  than  this  picture's  resolution. 


The  cunetu  tyttem  (Figure  1)  places  four  outward-looking 
image  sensors  on  the  wearer's  head  and  locates  LEDs  in  a  10'  x 
12'  suspended  ceiling  structure  of  modular  2'  x  2'  ceiling 
panels.  Each  panel  houses  32  LEDs,  for  a  total  of  960  LEDs  in 
the  ceiling.  Images  of  LEDs  are  formed  by  lateral-effect 
photodiode  detectors  within  each  head-mount^  sensor.  The 
location  of  each  LED's  image  on  a  detector,  or 
phoiocoordinale,  is  used  along  with  the  known  LED  locations 
in  the  ceiling  to  compute  the  head's  position  and  orientation. 
To  enhance  resolution,  the  field  of  view  of  each  sensor  is 
narrow.  Thus,  as  shown  in  Figures  3  and  7,  each  sensor  sees 
only  a  small  number  of  LEDs  at  any  instant.  As  the  user  moves 
about,  the  working  set  of  visible  LEDs  changes,  making  this  a 
cellular  head-tracking  system. 

Measurements  of  head  position  and  orientation  are  produced  at 
a  rate  of  20-100  Hz  with  20-60  ms  of  delay.  The  system's 
accuracy  has  not  been  measured  precisely,  but  the  resolution  is 
2  mm  and  0.2  degrees.  It  was  demonsirated  in  the  Tomorrow's 
Realities  gallery  at  the  ACM  SIGGRAPH  '91  conference,  and  is, 
to  our  knowledge,  the  first  demonstrated  scalable  head-tracking 
system  for  HMDs. 

The  system  is  novel  for  two  reasons.  First,  the  sensor 
configuration  is  unique.  Other  optical  tracking  systems  fix  the 
sensors  in  the  environment  and  mount  the  LEDs  on  the  moving 
body  [30].  The  outward-looking  configuration  is  superior  for  it 
improves  the  system's  ability  to  detect  head  rotation.  The 
scalable  work  space  is  the  system's  second  conuibution.  If  a 
larger  work  space  is  desired,  more  panels  can  be  easily  added  to 
the  overhead  grid. 


2  Previous  work 

Many  tracking  systems  precede  this  effort,  and  we  will  briefly 
survey  representative  examples.  The  essence  of  the  problem  is 
the  realtime  measurement  of  the  position  and  orientation  of  a 
rigid  moving  body  with  respect  to  an  absolute  reference  frame, 
a  six-degrec-of-freedom  (6DOF)  measurement  problem. 
Solutions  arc  relevant  to  many  other  fields. 

To  our  knowledge,  four  fundamentally  different  technologies 
have  been  used  to  track  HMDs:  mechanical,  magnetic, 
ultrasonic,  and  optical. 

The  first  HMD,  built  by  Ivan  Sutherland  [27],  used  a  mechanical 
linkage  to  measure  head  position.  A  commercial  product.  The 
Boom  [12],  uses  a  mechanical  linkage  to  measure  the  gaze 
direction  of  a  hand-held  binocular  display.  The  Air  Force 
Human  Resources  Laboratory  (AFHRL)  uses  a  mechanical 
linkage  to  measure  the  position  and  orientation  of  a  HMD  used 
for  simulation  [24],  Mechanical  systems  have  sufficient 
accuracy,  resolution,  and  frequency  response,  yet  their  range  is 
severely  limited,  and  a  mechanical  tether  is  undesirable  for 
many  applications. 

Magnetic-based  systems  [3][21]  are  the  most  widely  used  hand 
and  head  trackers  today,  They  are  small,  relatively 
inexpensive,  and  do  not  have  line-of-sight  restrictions.  Their 
primary  limitations  are  distortions  caused  by  metal  or 
electromagnetic  fields,  and  limited  range  [13]. 

Ultrasonic  approaches  have  also  been  successful,  such  u  the 
commercially-available  Logitech  tracker  [20].  Time-of-flight 
measurements  are  used  to  triangulate  the  positions  of  sensors 
mounted  on  the  HMD.  The  strength  of  this  technology  is 
minimum  helmet  weight  [13].  Physical  obscuration  as  well  as 
reflections  and  variations  of  the  speed  of  sound  due  to  changes 
in  the  ambient  air  density  make  it  difficult  to  maintain  accuracy 
[5]. 

Because  of  the  potential  fur  operation  over  greater  distances, 
optical  ai^roachcs  are  plentiful,  and  it  is  helpful  to  categorize 
them  on  the  basis  of  the  light  source  used.  Visible,  infrared, 
and  laser  light  sources  have  each  been  exploited. 

Fenin  [13]  reports  the  existence  of  a  prototype  helmet  tracking 
system  using  visible  light.  Although  it  only  tracks 
orientation,  it  is  worth  mentioning  here  because  of  its  unique 
approach.  A  patterned  target  is  placed  on  the  helmet  and  a 
cockpit-mounted  video  camera  acquires  images  in  real  time.  The 
pattern  is  designed  to  produce  a  unique  image  for  any  passible 
head  orientation.  The  strength  of  this  a{q>roach  is  ^e  use  of 
passive  targets  which  minimize  helmet  weight.  Reflections 
and  other  light  sources  are  potential  sources  of  error. 

Bishop's  Self-Tracker  |  /'l  is  a  research  effort  involving  visible 
light  A  Self-Tricxer  chip  sen'Cc  incremental  (iisplacoments 
and  roiaticns  by  imaging  an  unstructured  scene.  A  head- 
mounted  cluster  of  these  chips  provide  sufficient  information 
fur  the  computation  of  head  position  and  orientation. 
Although  still  under  development,  the  concept  is  mentioned 
hcie  because  it  wouM  allow  an  optical  tracking  system  to 
operate  ouid.iors,  sc  here  a  structured  environment,  such  as  our 
ceiling  of  LEDs,  would  be  impossible  to  realize. 
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Because  of  the  difficulties  associated  with  processing 
information  in  an  unstructured  scene,  most  high-speed  optica] 
measurement  systems  use  highly-structured  infrared  or  laser 
light  sources  in  conjunction  with  solid-state  sensors.  The 
sensor  is  a  often  a  lateral-effect  photodiode  as  opposed  to  a  true 
imaging  device,  because  the  photodiode  produces  currents  that 
are  dirMtly  related  to  the  location  of  a  light  spot's  centroid  on 
its  sensitive  surface  [32].  The  resultant  sensor  is  relatively 
insensitive  to  focus,  and  the  light  spot's  location,  or 
photocoordinate,  is  immediately  available  without  the  need  for 
image  processing. 

During  the  1970's,  Selspot  [23]  popularized  the  use  of  infrared 
LEDs  u  targets  and  lateral-effect  photodiodes  u  sensors  in  a 
commercially-available  system.  Their  primary  emphasis  was, 
and  still  is,  on  the  three-dimensional  locations  of  individual 
targets.  That  is,  the  Selspot  system  does  not  automate  the 
computation  of  a  rigid  body's  orientation.  In  a  re^nse  to  this 
shortcoming,  Antonsson  [2]  refined  the  Selspot  system  for  use 
in  dynamic  measurements  of  mechanical  systems.  The 
resultant  system  uses  two  Selspot  cameru  to  view  a  moving 
body  instrumented  with  LEDs.  Similar  approaches  have  been 
apidied  to  HMD  systems  in  cockpits  [13]  and  in  simulators 
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3  System  overview 

Wang  demonstrated  the  viability  of  head-mounted  lateral-effect 
photodiodes  and  overhead  LEDs.  This  system  extends  his  woric 
in  several  ways.  First,  an  overhead  grid  of  960  LEDs  wu 
produced  with  well-controlled  LED  location  tolerances,  and 
more  attention  was  paid  to  controlling  other  enor  sources  u 
well.  Second,  mathematical  techniques  were  developed  that 
allow  an  arbitrary  number  of  sensors  and  an  arbitrary  number  of 
LEDs  in  the  field  of  view  of  each  sensor  to  be  used  in  the 
computation  of  head  location.  This  resulted  in  an 
overdetermined  system  of  equations  which,  when  solved,  was 
less  susceptible  to  system  error  sources  than  the  previous 
mathematical  approach  [10].  Third,  the  analog  signals 
emerging  from  the  sensors  were  digitally  processed  to  reject 
ambient  light.  Finally,  techniques  for  quickly  determining  the 
working  sets  of  LEDs  were  developed. 

3.1  Stnsor  configuration 

Typically,  optical  trackers  are  inward- looking',  sensors  are 
fixed  in  the  environment  within  which  the  HMD  wearer  moves. 
With  Self-Tracker,  Bishop  and  Fuchs  introduced  the  concept  of 
ouiward- looking  trackers  that  mount  the  image  sensors  on  the 
head,  looking  out  at  the  environment  (Figure  3). 


The  use  of  an  LED  light  source  limits  the  range  of  these 
systems.  Typically,  the  distance  between  source  and  detector 
can  be  no  greater  than  several  feet.  Longer  distances  can  be 
spanned  with  laser  light  sources. 

The  only  known  example  of  a  6DOF  tracker  using  laMr  sources 
is  the  Mimiesota  Scanner  [26].  With  this  sytem,  scanning 
mirrors  are  used  to  sweep  orthogonal  stripes  of  light  acrou  the 
working  volume.  Photodiodes  are  both  fixed  in  space  and 
placed  on  the  moving  body.  By  measuring  the  time  between  a 
light  stripe's  contact  with  a  fixed  and  moving  photodiode,  the 
diode's  three-dimensional  location  can  be  computed.  Given  the 
location  of  three  or  more  moving  diodes,  the  moving  body's 
orientation  can  be  computed.  Similar  technology  has  been 
ai^lied  to  the  cockpit,  although  orientation  was  the  only 
concern  [13]. 


Figure  3:  Conceptual  drawing  of  outward-looking  system  and 
the  sensors' fields  of  view 


If  a  large  work  area  is  required,  outward-looking  configurations 
have  an  advantage  over  inward-looking  techniques  when 
recovering  orientation.  The  two  are  equivalent  for  measuring 
translation:  moving  the  sensor  causes  the  same  image  shift  as 
moving  the  scene.  Rotations  are  significantly  different. 
Unless  targets  are  mounted  on  antlers,  an  inward-looking 
sensor  perceives  a  small  image  shift  when  the  user  performs  a 
small  head  rotation.  The  same  head  rotation  creates  a  much 
larger  image  shift  with  a  head-mounted  sensor.  For  a  given 
sensor  resolution,  an  outward-looking  system  is  more 
sensitive  to  orientation  changes. 


Figure  4;  Remote  Processor  and  head  unit  with  four  sensors 


To  improve  resolution  in  general,  long  focal  lengths  must  be 
used  with  an  optical  sensor  regardless  of  whether  the 
configuration  is  inward  or  outward-looking.  Thus,  a  wide-angle 
lens  caimot  significantly  extend  the  work  area  of  an  inward¬ 
looking  system  without  sacrificing  resolution  and  accuracy. 


Narrow  fields  of  view  are  a  consequence  of  long  focal  lengths. 
Therefore,  the  HMD  wearer  caruiot  move  very  far  before  an  LED 
leaves  a  given  sensor's  field  of  view.  One  solution  is  a  cellular 
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airay  of  either  LEDi  or  detectors.  For  an  infrared  system  using 
LEDs  aiid  lateral<effect  photodiodes,  system  cost  is  minimized 
by  rqilieating  LEDs  as  opposed  to  sensors.  This  is  a  result  of 
the  device  cost  as  well  as  the  requited  support  circuitry. 

In  the  current  system,  four  Hamamatsu  (model  SI 880)  sensors 
an  mounted  atop  the  head,  u  shown  in  Figure  4.  Each  Mnsor 
conaists  of  a  camera  body  to  which  a  Fujinon  ions  (model  CF 
SOB)  is  attached.  The  focal  length  of  each  lens  is  SOnun.  Their 
principal  points  were  determined  experimentally  by  an  optical 
laboratory.  An  infrared  filter  (Tiffen  87)  is  used  to  reject 
amWent  light. 

3.2  Beacon  configuration 

Experience  with  simulations  and  an  early  48-LED  prototype 
revealed  the  proUem  of  beacon  switching  enoti  as  the  user 
moved  around  and  the  working  set  of  beacons  changed, 
discontinuous  jumps  in  position  and  orientation  occuned. 
ThsM  are  caused  1^  errors  in  the  senso'  locations,  distortions 
caused  by  the  lens  and  photodiode  detector,  and  enars  in  the 
positions  of  the  beacons  in  the  ceiling. 

To  control  beacon  locations,  we  housed  the  LEDs  in  carefully 
oonstfuetad  ceiling  panels.  Each  2'  x  2'  panel  is  an  anodized 
aluminum  enclosure  that  encases  a  20"  x  iU"  two-sided  printed 
circuit  board.  On  this  board  ue  electronics  to  drive  32  LEDs. 
The  LEDs  are  mounted  in  the  front  luiface  with  standaid  plutic 
insets.  Using  standard  electronic  enclosure  .manufacturing 
techniques,  it  wu  relatively  easy  to  realize  an  LED-to-LED 
centerline  spacing  tolerance  of  .003"  on  a  given  psncl. 

The  panels  are  hung  from  a  Unisuut  superstructure  (Figure  1). 
At  each  interior  vertex  of  a  2'  x  2*  grid,  a  vertically  adjustable 
hangar  mates  with  four  panels.  Four  holes  in  the  face  of  a  panel 
slide  onto  one  of  four  dowels  on  each  hanger.  The  entire  array 
of  panels  is  levelled  with  a  Specus  Physics  Laser-Level,  which 
est^ishes  a  plane  of  visible  red  light  several  inches  below  the 
panels'  faces.  Each  hanger  is  designed  to  accept  a  seiisor 
(Industrs-Eye)  that  meuures  the  vertical  position  of  (he  laser 
relative  to  its  own  case.  By  moving  the  hangers  up  o'  down, 
they  can  be  aligned  to  within  .006"  of  Uie  light  beam. 

The  pmels  are  electrically  connected  by  a  data  and  power  daisy 
chain.  The  data  daisy  chain  allows  an  individual  LEI)  to  be 
selected.  Once  select^  the  LED  (Siemens  SFH  487P)  can  be 
driven  with  a  programmable  citnen?  that  ranges  from  0-2 
ampmes.  The  progranunable  r  arrent  allows  an  electronic  iris 
feature  to  be  implemented.  Typically,  an  LED  will  be  on  for  no 
more  than  200  psec.  During  this  time  period,  the  cunent  is 
adjusted  to  achieve  a  desired  .  ignal  level  at  the  sensor  (sec 
Section  4). 


The  LED  Manager  is  a  68030-based  processing  module  that 
controb  the  Remote  Processor  as  well  u  the  ceiling.  A  TAXI- 
bued  serial  datalink  [1]  provides  access  to  the  Remote 
Processor  while  the  ceiling's  data  daisy  chain  terminates  at  the 
LED  Manager.  Software  executing  on  this  module  is 
responsible  for  turning  LEDs  on  and  for  extracting  data  from 
the  sensors.  The  LED  Manager  resides  in  a  remote  VME  chassis 
that  must  be  located  near  the  ceiling  structure. 


.  Ceiling  panels  w/LEDs 


= 

n 

L^cnion  A 

a 

1  IJ  Remote 

Ltprocessor 

LJ 

Data 

U 

Light 

LED 


link 


Photo- 

conrdinates 


6S030-based 

68030-baied 

i860-bucd 

processor 

processor 

processor 

LEDMgnagvr  Queue  Maiwger  ColHnearlly 

,;):v 

Figwre  5:  System  DaU^low 


For  each  measurement  of  head  location,  the  LED  Manager 
produces  a  list  of  visible  LEDs  and  their  usociated 
phoiocoordinates.  This  list  is  transfened  via  shared  memory  to 
the  Collinsarity  module,  which  resides  in  the  graphics  engine's 
VME  chassis.  The  i860-based  Colliiwaiity  module  translates 
the  list  of  phulocourdinate*  into  die  current  estimate  of  head 
location.  For  teasons  explained  in  Section  6,  an  additional 
6i>030  based  processor  is  used  to  aid  the  transfer  of  data  from 
the  remote  system  to  the  host.  In  uieory,  Utis  is  not  required. 
The  VME  systems  are  connected  by  a  Bit-3  VME  buslink. 


The  sampled  head  posititm  is  communicated  to  the  Pixel-Planes 
S  graphics  engine  (14],  which  in  turn  updates  the  images  on  the 
user's  displays. 


4  Low-level  software 


Data  Flow 

As  shown  in  Figure  S,  the  signals  emerging  from  the  head-: 
mounted  sensors  are  connected  to  the  Remote  Processor.  Worn 
as  a  belt  pack,  the  Remote  Processor  functions  as  a  remote 
analog-to-digital  conversion  module.  It  can  accept  the  four 
analog  voltages  emerging  from  a  lateral-effect  photodiode,  for 
iq>  to  eight  sensors.  On  command,  the  Remote  Processor  will 
simultaneously  sample  the  four  voltages  on  a  selected  sensor 
and  relay  four,  12-bit  results  to  the  LED  Manager.  The  Remote 
Processor  was  used  to  alleviate  the  need  for  long  runs  of  analog 
signals  emerging  from  multiple  sensors. 


A  library  of  low-level  routines  running  on  the  LEO  Manager, 
called  the  Acquisition  Manager,  controls  the  beacons  and 
detectors.  Given  an  LED  and  a  jdiotodiode  unit,  these  routines 
light  an  LED  and  determine  if  a  photodiode's  detector  sees  that 
LED.  The  detector  returns  four  analog  signals,  which  the 
Remote  Processor  board  digitizes.  A  simple  formula  (16] 
converts  these  four  numbers  into  the  x,y  photocoordinates  of 
the  LED's  projection  on  the  detector. 

Hamamatsu  datash  .-is  specify  1  part  in  40  accuracy  and  1  part 
in  SOOO  resolui  .  tor  the  lateral-effect  diode-based  detectors 
used.  As  will  Antonsson  (2),  we  were  able  to  achieve 
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approximately  1  part  in  1000  accuracy  for  the  combined 
photodiode-lens  assembly.  Achieving  this  result  required 
significant  efforts  to  improve  the  signal-to-noise  ratio  and 
compensate  for  distortion,  including; 

Ambient  light  rejection;  The  voltage  values  with  the  LED  off 
(called  the  "dark  current")  are  subtracted  from  the  voltage  values 
with  the  LED  on.  Sampling  with  the  LED  off  both  before  and 
after  the  samples  with  the  L£D  on  and  averaging  the  two  yields 
substantially  improved  ambient  light  rejection. 

Random  noise  rejection:  Averaging  several  meuurements 
reduces  random  noise  effects,  but  costs  time.  A  good 
comprtmiise  between  accuracy  and  sampling  speed  is  to  take  8 
samples  with  the  LED  off,  16  samples  with  the  LED  on  and  8 
more  samples  with  the  LED  off. 

Current  KoUng;  The  distance  between  a  photodiode  and  an  LED 
depends  on  the  user's  location.  To  maximize  the  signal  without 
saturating  the  photodiode  detector,  the  Acquisition  Manager 
dynamically  adjusu  the  amount  of  current  used  to  light  an  LED. 
Acquisition  Manager  routines  estimate  the  threshold  of  current 
that  will  saturate  the  detector  and  use  90%  of  this  value  during 
sampling. 


Figure  6:  Optical  bench  for  photodiode  calibration 


Calibration:  Both  the  lens  and  the  photodiode  detector  suffer 
from  nonlinear  distortions.  By  placing  the  photodiodes  on  an 
optical  bench  and  carefully  measuring  the  imaged  points 
generated  by  beacons  at  known  locations  (Figure  6),  we  built  a 
lookup  table  to  compensate  for  these  distortions.  Bilinear 
interpolation  i»ovides  complete  coverage  across  the  detector. 
More  sophisticated  calibration  techniques  should  be 
investigate.  Accurate  calibration  is  required  to  reduce  beacon 
switching  error. 

Programming  techniques:  Techniques  such  as  list  processing, 
cache  management  and  efficient  code  sequencing  result  in  a 
substantially  improved  sampling  rate.  In  addition,  expedited 
handling  of  special  cases,  such  as  when  an  LED  is  not  within 
the  field  of  view  of  a  photodiode  unit,  further  helps  system 
performance. 

Using  32  samples  per  LED,  we  compute  a  visible  LED's 
photocoordinate  in  660  psec  and  reject  a  non-visible  LED  m 


100  psec.  LEDs  are  tested  in  groups;  each  group  canies  an 
additional  overhead  of  60  psec. 


Figure  7:  Sensors  viewing  LEDs  in  the  ceiling.  Each  cf  the  four 
groups  is  the  set  of  LEDs  that  a  sensor  can  see.  Picture  taken 
with  a  camera  that  is  sensitive  to  irfrared  light. 


5  LED  Manager 

The  LED  ffanager  uses  the  low-level  Acquisition  Manager 
routines  to  determine  which  LEDs  each  photodiode  unit  sees 
and  where  the  associated  imaged  points  are  on  the  photodiode 
detectors.  We  usually  want  to  collect  data  from  all  visible 
LEDs,  since  larger  sample  sets  ultimately  yield  less  noisy 
solutions  from  the  Collinearity  module  (Section  7).  Because 
the  number  of  visible  LEDs  is  small  (see  Figure  7)  compared  to 
the  total  number  of  LEDs  in  the  ceiling,  something  futer  than 
a  brute-force  scan  of  the  entire  ceiling  anay  is  called  for.  Two 
usumpiions  help  us  design  a  more  efficient  method: 

1)  Spatial  coherence;  The  set  of  beacons  visible  to  a 
photodiode  unit  in  a  given  frame  will  be  contiguous. 

2)  Temporal  coherence;  The  user's  movement  rate  will  be  slow 
compared  to  the  frame  rate.  This  implies  that  the  field  of  view 
of  a  given  photodiode  unit  does  not  travel  very  far  across  the 
ceiling  between  frames,  so  its  set  of  visible  beacons  will  not 
change  much  from  one  frame  to  the  next. 

5.1  The  basic  method 

In  each  frame,  the  LED  Manager  goes  through  each  photodiode 
unit  in  sequence,  sampling  beacons  until  it  is  satisfied  that  it 
has  captured  most  of  each  photodiode  unit's  visible  set.  A 
basic  difficulty  is  that  we  cannot  be  sure  whether  a  beacon  is 
visible  or  not  until  we  attempt  to  sample  it.  The  LED  Manager 
remembers  which  beacons  were  in  the  camera's  visible  set  from 
the  previous  frame.  The  set  is  called  the  lust  visible  set.  If  the 
last  visible  set  is  nonempty,  all  beacor,s  in  that  set  are  tested. 
The  next  action  depends  on  how  mraiy  of  those  beacons  are 
actually  visible: 

1)  All:  We  assume  the  field  of  view  has  not  moved  much  and 
not  many  more  beacons  will  be  visible.  We  stop  with  this  set 
and  SLO  on  to  the  next  photodiode  unit. 

2)  Some-  We  assume  that  the  field  of  view  has  shifted 
significantly,  possibly  enough  to  include  previously  unseen 
beacons.  A  shell  fill  (described  later)  is  conducted,  beginning 
with  the  set  of  beacons  verified  to  be  visible. 
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S)  None:  The  Add  of  view  has  moved  dramatically,  gone  off 
the  edge  of  the  ceiling,  or  is  obscured.  We  check  the  neighbors 
of  the  last  visible  set.  If  any  of  these  beacons  are  visible,  they 
are  used  to  start  a  shell  fill.  If  none  are  visible,  we  give  up  on 
this  photodiode  unit  until  the  next  frame. 

What  if  the  last  visible  set  is  empty?  Our  course  of  action 
depends  on  whether  we  were  able  to  compute  a  valid  position 
aid  orientation  for  the  head  in  the  last  frame; 

1)  Valid  previous  location;  We  can  predict  which  LEDs  should 
be  visible  to  our  photodiode  unit,  if  dte  user's  head  is  actually 
at  the  computed  location,  because  the  geometry  of  the  nead  unit 
is  known.  If  no  LEDs  are  predicted  to  be  visible,  we  go  on  to 
the  next  photodiode  unit,  otherwise  we  sample  those  beacons 
and  uw  ^em  u  the  start  of  a  shell  fill,  if  any  of  them  were 
actually  visible. 

2)  No  valid  previous  location;  Now  we  have  no  way  to  guess 
which  beacons  are  visible,  so  we  resort  to  a  simple  sweep 
search,  which  lights  the  beacons  in  the  ceiling  row  by  row. 
until  we  have  tried  the  entire  ceiling  or  an  LED  is  found  to  be 
visible.  In  the  former  case,  we  give  up.  and  in  the  latter  case, 
we  use  the  visible  beacon  as  the  start  of  a  shell  fill. 

S.2  Shell  nil 

A  shell  fill  starts  with  a  set  of  beacons  known  to  be  visible  to  a 
sensor  and  sweeps  outward  until  it  has  found  all  the  beacons  in 
the  field  of  view. 

We  do  this  by  first  sampling  the  neighbors  of  the  initial  set  of 
beacons.  If  none  are  found  visible,  the  shell  fill  terminates, 
concluding  that  the  beacons  in  the  initial  set  are  the  only 
visible  ones.  If  any  are  found  visible,  we  then  c  impute  the 
neighbors  of  the  beacons  we  just  sampled,  excluding  those 
which  have  already  been  tried,  and  sample  those.  We  repeat 
this  process  of  sampling  beacons,  computing  the  neighbors  of 
those  found  visible,  and  using  those  neighbors  as  the  next 
sample  set.  until  an  iteration  yields  no  additional  visible 
beacons. 

Assumption  1,  that  visible  sets  are  contiguous,  suggests  that 
this  procedure  should  be  thorough  and  reasonably  efficient. 

S3  Startup 

At  star.'up,  the  head  location  is  not  known  and  all  of  the  last 
visible  s'ts  are  empty.  We  do  a  sweep  search,  as  previously 
described,  for  each  i^otodiode  unit  to  locate  the  initial  visible 
sets. 


6  Communications 

Communication  between  the  various  processors  in  our  system 
is  done  using  shared  memory  buffers,  which  offer  low  latency 
and  high  speed.  The  buffers  are  allocated  and  deallocated  via  a 
FIFO  queue  mechanism.  Data  is  "transmitted"  when  it  is  written 
to  the  buffer:  no  copying  is  necessary.  The  only 
communication  overhead  is  the  execution  of  a  simple 
semaphore  acquisition  and  pointer  management  routine. 
Furthermore,  all  processors  use  the  same  byte  ordering  and  data 
type  size,  so  no  data  translation  is  needed. 


The  queuing  mechanism  lets  all  modules  in  the  system  run 
asynchronously.  LED  Manager,  the  Collinearity  module,  and 
Pixel-Planes  5  run  as  fast  as  they  can,  using  the  most  recent 
data  in  the  queue  or  the  last  known  data  if  the  queue  is  empty. 

The  various  processors  in  our  system  are  split  between  two 
separate  VME  buses,  which  are  transparently  linked  together 
by  Bit-3  bus  link  adapters  (Figure  S).  A  subtle  bus  loading 
[Koblem  prevents  the  i860  board  and  the  '030  board  that  runs 
LED  Manager  from  operating  in  the  same  VME  cage.  This 
configuration  increases  latency  because  inter-bus  access  is 
significantly  slower  than  intra-bus  access,  but  increases 
throughput  because  the  bus  link  allows  simultaneous  intra-bus 
activity  to  occur.  Because  the  i860  processor  cannot  directly 
access  the  VME  bus,  a  second  '030  board,  which  runs  the  Queue 
Manager,  moves  data  between  the  LED  Manager  and  the 
Collinearity  module. 

A  simpler  and  less  expensive  system  could  be  built  if  we 
acquired  an  i860  board  that  can  run  on  the  same  bus  as  the  LED 
Manager  '030  board.  This  configuration  would  not  require  the 
Queue  Manager  board  or  the  Bit-3  links  and  would  reduce  both 
latency  and  throughput. 


7  Space  Resection  by  Collinearity 

Given  the  observations  of  beacons,  we  compute  the  position 
and  orientation  of  the  user's  head  by  using  a  photogrammetric 
technique  called  space  resection  by  collinearity.  The  basic 
method  for  a  single  camera  is  in  [31];  what  we  describe  here  is 
our  extension  for  using  it  in  a  multi-sensor  system.  Because  of 
space  limitations,  the  description  is  necessarily  brief.  Full 
details  are  provided  in  [6]. 

7.1  Definitions 

Three  types  of  coordinate  systems  exist;  one  World  space  (tied 
to  the  ceiling  structure),  one  Head  space  (lied  to  the  HMD),  and 
several  Photodiode  spaces  (one  for  each  photodiode  unit). 


Photodiode 

Plioiodiode  ^2 


Figure  8:  World,  Head  and  Photodiode  spaces 

Changing  representations  from  one  space  to  another  is  done  by 
a  rotation  followed  by  a  translation.  We  use  two  types  of  3x3 
rotation  matrices: 

•M  -  Head  space  to  World  space 
M,  =  Photodiode  space  i  to  Head  space 
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with  each  matrix  specified  by  Euler  angles  o),  a,  and  k. 

The  optical  model  for  each  photodiode  unit  is  simple:  a  light 
ray  strikes  the  front  principal  point  and  leaves  the  rear 
princ^al  point  at  the  same  angle  (Figure  9). 


Photodiode  unit  i 

Figure  9:  Optical  model 


Finally,  we  list  the  points  and  vectors  we  will  need,  segregated 
by  the  coordinate  system  in  which  they  are  represented.  Given 
{^todiode  unit  i  sees  LED  number  j. 

Photodiode  space: 

(x^,  yij,  0]  a  imaged  puint  ov.  (inotodiode  detector 
Head  space: 

tif  a  vector  from  rear  principal  point  to  imaged  point 
Ho  =  oHgin  of  Head  space 

<l,  a  vector  from  Ho  lo  center  of  photodiode  detector 
Cj  a  vector  from  Ho  to  rear  principal  point 
f,'  r  vector  from  Ho  to  front  principal  point 

World  space: 

(Xo,  Yq.  Zgt  =  coordinates  of  the  origin  of  Head  space 
(Xy.  fy,  Zj]  =  courdinates  of  LED  j 

a  vector  from  LED  j  to  front  principal  point 

7.2  Geometric  relationships 

Figure  9  shows  that  T,y  and  t,;y  differ  only  by  a  scale  factor;  if 
they  were  placed  at  the  same  start  point,  they  would  be 
collinear.  In  equations: 


TyaXMto  (1) 

We  now  ex(»ess  T,y  and  t^  in  terms  of  the  other  vectors  in 
equations  (2)  and  (3)  and  Figures  10  and  11: 


Tv  = 


Xo-X, 

Yo-Y, 

Zo-Z, 


+  Mr, 


(2) 


ty  =  d.  -  e,  +  M, 


X,j 

0 


(3) 


Figure  10:  Expressing  through  other  vectors 


Figure  11:  Expressing  through  other  vectors 


Substituting  (2)  and  (3)  into  (1)  yields  the  colliiiearity 
condition  equation  : 


Cy: 

■  Xo  Xy  ■ 
Yo-Y, 

+  Mfi=XM 

d,  •  e;  Ml 

.  Zo-Zj  . 

1 

[oil 

7.3  System  of  equations 

When  a  photodiode  unit  i  sees  an  LED  j,  it  generates  a  which 
represents  three  independent  equations.  If  we  see  N  LEDs  in  all, 
the  total  number  of  unknowns  in  our  system  is  6-t-N:  3  for 
position,  3  for  orientation,  and  N  scale  factors.  The  fust  six 
are  what  we  are  trying  to  fmd,  but  we  do  not  care  aL».it  the  scale 
factors.  We  eliminate  these  by  reananging  the  Cjj  equations, 
then  dividing  the  fust  and  second  equations  by  the  third.  This 
leaves  two  independent  equations,  of  the  form 


Cy,y{L)  =  0.  C2,y(L)  =  0 


where  L  ts  a  vector  composed  of  the  six  unknowns:  position 
<Xq,  Yq,  Zg)  and  orientation  (a,  a,  K  for  matrix  M).  We 
generate  a  linear  approximation  to  these  two  equations  by 
applying  Taylor's  theorem: 


dXo  + 

/aciy<L) 

\  axo 

\  axo  , 

1  \  1 

/aciiXD 

do}+  1 

a6’iy(L)\ 

aw  , 

1 

aa  I 

\  Bk  1 

and  a  similar  expansion  for  the  linearized  G2  equation. 


Now  we  have  six  total  unknowns,  and  every  LED  that  we  see 
generates  two  independent  linear  equations.  Thus,  we  need  to 
see  at  least  three  LEDs  If  we  see  a  total  of  N  L.EDs,  we  can  wtIuj 


our  ty*tem  of  N  lineuized  G1  equations  and  N  linearized  G2 
equations  in  matrix  form: 

•Go  >  dG  *  D  (4) 

TNxl  2Nx6  6x1 

wlwte  D  >  [dXif,  dYo,  dZo,  da,  da,  dkl^, 

dG  is  the  matrix  of  partial  derivatives  of  the  Gl  and  G2, 

and  -Gq  contains  the  values  of  the  Gl  and  G2  at  a  specific  L. 

7.4  Iteration  and  convergence 
Collinearity  takes  an  initial  guess  of  L  (the  unknowns)  and 
generates  correction  values  (in  D)  to  make  a  more  accurate  L. 
iterating  until  it  converges  to  a  solution.  Thus,  wo  need  to 
extract  D  firom  equation  (4).  If  >  3,  then  we  can  solve  for  D 
directly.  If  A/  >  3.  then  the  system  is  overdetermined  and  we 
approximate  D  through  singular  value  decomposition  (24). 
Simulations  show  that  using  more  than  the  minimum  of  3  LEDs 
can  reduce  average  enor  caused  by  r  on'systematic  enor 
sources.  In  pseudo^e.  our  main  loop  is: 

Generate  an  initial  guess  for  L 
repeat 

Given  L,  compute  Go  and  dG 

Estimate  D  using  singular  value  decomposition 

L»L-«-D 

until  magnitude  of  D  is  smalt 
return  L 

How  do  we  generate  the  initial  guess  of  L?  Normally  we  use  the 
last  known  position  and  orientation,  which  should  be  an 
excellent  guess  because  we  track  at  rates  up  to  100  Hz. 
Collinearity  usually  converges  in  1  or  2  iterations  when  the 
gueu  is  close.  But  in  degenerate  cases  (at  system  startup,  or 
when  we  lose  tracking  because  the  photodiode  uniu  are  pointed 
away  from  the  ceiling),  we  have  no  previous  L.  Collinearity 
will  not  converge  if  the  guess  is  not  close  enough  to  the  Uue 
value:  we  empirically  found  tiiat  being  within  3^  and  several 
feet  of  the  true  L  is  a  good  rule  of  thumb.  So  in  degenerate 
cases,  we  draw  initial  guesses  for  L  from  a  precomputed  lookup 
table  with  120  entries,  trying  them  sequentially  until  one 
converges.  We  can  double-check  a  result  that  converges  by 
comparing  the  set  of  LEDs  used  to  generate  that  solution  to  the 
theoretical  set  of  LEDs  that  the  photodiode  units  should  see,  if 
the  head  actually  was  at  the  location  just  computed.  When 
these  two  sets  match,  we  have  a  valid  solution. 


8  Performance 

A  “typical  situation"  is  defined  as  a  user  of  average  height 
standing  erect  underneath  the  ceiling,  with  at  least  three 
photodiode  units  aimed  at  the  ceiling,  moving  his  head  at 
moderate  speeds.  All  measurement  bounds  assume  that  the  user 
remains  in  tracker  range  with  at  least  two  sensors  aimed  at  the 
ceiling. 

Update  rate:  The  update  rate  ranges  between  20-100  Hz.  Under 
typical  situations,  SO-70  Hz  is  normal,  depending  on  the 
height  of  the  user.  The  wide  variation  in  the  number  of  LEDs 
seen  by  the  sensors  causes  die  variation  in  update  rate.  The 
more  LEDs  used,  the  slower  the  update  rate,  because  LED 
Manager  is  the  slowest  step  in  the  pipeline.  If  the  head 
remains  still  and  the  sensors  see  a  total  of  B  beacons,  LED 


Manager  requires  3.33  +  0.782*B  ms  to  run.  Rapidly  rotadng 
the  head  increases  this  time  by  a  factor  of  about  1..'3,  since 
additional  time  is  required  to  handle  the  changing  working  sets 
of  LEDs.  Slower  head  movement  rates  have  conespondingly 
smaller  factors. 

Lag:  Lag  varies  between  20-60  ms.  with  30  ms  being  normal 
under  typical  situations.  Lag  is  measured  from  the  time  that 
LED  Manager  starts  to  the  time  when  the  Collinearity  module 
provides  a  computed  head  location  to  the  graphics  engine. 
Therefore,  tracker  latency  is  a  function  of  the  number  of  LEDs 
seen  and  the  quality  of  the  initial  guess  provided  to  the 
Collinearity  module.  As  B  gets  smaller,  both  the  LED  Manager 
and  Collinearity  modules  become  faster,  reducing  latency,  lliis 
mutual  dependence  on  B  means  that  update  rate  and  lag  are 
closely  tied:  faster  update  rates  correspond  with  lower  latency 
values. 

Resolulion:  When  moving  the  head  unit  very  slowly,  we 
observed  a  resolution  of  2  mm  in  position  and  0.2  degrees  in 
orientation.  Measuring  accuracy  is  much  harder,  and  we  do  not 
have  any  firm  numbers  for  that  yet.  At  SIGGRAPH  *91,  users 
were  able  to  touch  a  chair  and  the  four  ceiling  support  poles 
based  solely  on  the  images  they  saw  of  models  of  the  chair  and 
the  poles  in  the  virtual  environment. 


9  Evaluation 

The  system  provides  adequate  performanee  but  has  several 
limitations  and  problems  that  must  be  addressed.  The  most 
noticeable  is  the  combination  of  excessive  head-born  weight 
and  limited  head  rotation  range.  Rotation  range  depends 
heavily  on  the  user's  height  and  position  under  the  ceiling.  A 
typical  maximum  pitch  range  near  the  center  of  the  ceiling  is 
4S  degrees  forward  and  45  degrees  back.  When  the  user  walks 
near  an  edge  of  the  ceiling,  head  rotation  range  becomes  much 
more  restricted.  To  accommodate  the  full  range  of  head  motion, 
multiple  image  sensors  must  be  oriented  such  that  wherever  the 
head  is  pointed,  two  or  more  sensors  are  able  to  view  LEDs  on 
the  ceiling.  Given  the  cuneni  focal  lengths,  simulations  show 
that  as  many  as  eight  fields  of  view  are  required  for  a 
respectable  rotation  range  (291.  The  weight  of  each  sensor 
must  be  significantly  reduced  to  achieve  this  goal. 

To  reduce  weight,  we  are  trying  to  replace  the  cunent  lenses  (11 
oz.  each)  with  smaller,  lighter  lenses  (2  oz.  each).  Other 
approaches  are  possible.  Wang  proposed  optically 
mtdtiplexing  multiple  fields  of  view  onto  on  a  single  lateral- 
effect  photodiode  [29].  Reduced  signal  strength,  distortions, 
and  view  identification  ambiguities  make  this  a  nontrivial 
task.  It  may  be  easier  to  design  a  helmet  with  integral 
photodiodes  and  lenses.  Given  that  each  photodiode  is  about 
the  size  of  a  quarter,  the  entire  surface  of  a  helmet  could  be 
studded  with  sensors. 

Beacon  switching  error  has  been  greatly  reduced,  but  not 
eliminated.  Small  observable  discontinuities  occasionally 
occur,  and  while  they  are  nut  a  major  disturbance,  they  are 
annoying.  Calibration  techniques  are  being  explored  to 
estimate  error  sources  and  compensate  for  their  effects. 
Photogrammetric  teclmiques  like  the  bundle  adjustment  method 
(8|  or  an  alternate  scheme  suggested  by  our  colleagues  [18]  may 
provide  the  answer. 


50 


Infrared  light  sources  in  the  environment  surrounding  the 
tracker,  such  as  sunlight  or  incandescent  light,  must  be 
controlled  for  the  system  to  operate  correctly.  Specifically, 
any  light  source  whose  wavelengths  include  880  nm  will 
detected  by  the  photodiodes  as  if  it  were  an  LED.  For  this 
reason,  fluorescent  ambient  lighting  is  preferred.  Extreme 
caution  is  not  required,  however.  Whereas  a  sensor  pointed 
directly  at  an  infrared  light  source  other  than  the  LEDs  will 
confuse  the  system,  a  certain  level  of  indirect  infrared 
background  light  is  tolerable  due  to  the  combination  of  optical 
filters  and  the  ambient  light  rejection  techniques  described  in 
Section  4. 

Surprisingly,  the  bottleneck  in  the  system  is  the  time  required 
to  extract  data  from  the  photodiode  detectors,  not  the  time 
required  to  compute  the  head's  location.  The  i860  processor 
performs  the  latter  task  adequately,  and  even  faster  and  cheaper 
processors  will  be  available  in  the  future.  But  getting  accurate 
photocoordinates  from  the  detectors  takes  longer  than 
expected,  because  of  the  time  spent  in  cunent  scaling  and  in 
sampling  multiple  times  per  LED.  Further  experimentation  is 
required  to  see  if  we  can  safely  reduce  the  number  of  samples. 
Optimizing  the  low-level  software  may  improve  sampling 
speed  by  20-30%. 

The  use  of  Euler  angles  in  the  collinearity  equations  opens  the 
possibility  of  gimbal  lock.  The  current  system  avoids  this 
because  the  head  rota. ion  range  is  too  limited  to  reach  gimbal 
lock  positioru,  but  a  funire  version  may.  If  we  cannot  place  the 
gimbal  lock  positions  out  of  reach,  we  can  solve  for  the  nine 
rotation  matrix  parameters  individually,  subject  to  six 
constraints  that  keep  the  matrix  special  orthogonal,  or  we  may 
be  able  to  recast  the  rotations  as  quaternions. 

Since  this  uacker  encourages  the  user  to  walk  around  large 
spaces,  tripping  over  the  supporting  cables  is  a  danger.  We 
will  investigate  the  feasibility  of  a  wireless  datalink  to  remove 
this  problem. 

Under  certain  circumstances,  the  sensors  can  see  large  numbers 
of  beacons,  such  as  a  total  of  30  or  more.  While  using  many 
LEDs  usually  improves  the  solution  from  the  Collinearity 
module,  it  also  slows  down  the  update  rate  and  increases  the 
lag.  Further  experiments  are  needed  to  explore  this  tradeoff  and 
determine  rules  of  thumb  that  pmude  a  reasonable  balance 
between  resolution  and  update  rate. 

Cellular  systems  using  different  technologies  or  configurations 
could  be  built  to  achieve  similar  scalable  work  areas.  For 
example.  Ascension  has  announced  a  cellular  magnetic  system 
|4].  Regardless  of  the  technology,  any  cellular  approach 
creates  the  problem  of  beacon  switching  error  or  its  equivalent. 
Steps  we  took  to  control  these  enors  would  apply  to  other 
technologies  as  well;  1)  precise  positioning  and  measurement 
of  system  components,  2)  averaging  techniques  to  reduce 
random  error  sources,  and  3)  calibration  routines  lo  cuni(vnsaie 
for  systematic  error  sources. 


10  Future  work 

We  intend  to  continue  improving  this  sysiem  in  addiiiun  to 
the  tusks  listed  in  Section  9,  we  would  eventually  like  to 


expand  the  ceiling  size  to  around  20’  x  20’,  to  provide  much 
greater  range  of  movement,  both  quantitatively  and 
psychologically.  Also,  ample  room  exists  to  improve  the 
heuristics  and  optimize  the  code,  increasing  the  update  rate  and 
reducing  latency. 

But  beyond  these  incremental  improvements,  we  do  not  expect 
to  pursue  this  particular  technology  further.  The  system  is  a 
vehicle  for  furthe'  research  and  provides  room-sized  tracking 
capability  today  fur  HMD  applications  that  require  it.  For 
example,  the  UNC  Walkthrough  team  has  begun  interview- 
based  user  studies  on  what  impact  large-environment  tracking 
has  on  the  architectural  design  of  a  kitchen.  In  the  future, 
emphasis  will  be  placed  on  technologies  that  allow  unlimited 
tracking  volumes  in  unstructured  environments.  This  potential 
exists  in  systems  that  measure  only  the  relative  differences  in 
position  and  orientation  as  the  user  moves,  integrating  these 
differences  over  time  to  recover  the  user’s  location.  Examples 
include  inertial  technologies  and  Self-Tracker.  Since  these 
technologies  suffer  from  drift  problems,  initial  versions  may 
be  hybrid  systems  reliant  on  the  optical  tracker  for  auxiliary 
information.  Thus,  the  optical  tracking  system  will  serve  as  a 
testbed  for  its  own  successor. 

Tracking  HMDs  will  only  get  harder  in  the  future.  The  higher 
resolution  displays  being  developed  demand  higher  resolution 
trackers.  See-through  HMDs  add  additional  requirements.  In 
the  completely-enclosed  HMDs  commonly  used  today,  the 
entire  world  is  virtual,  so  resolution  is  much  more  important 
than  accuracy.  But  fur  a  .see-through  HMD,  accurate 
registration  of  the  HMD  to  the  real  world  is  vital.  The  effects 
of  latency  will  also  become  more  disturbing  in  see-through 
HMDs.  Viewing  computer-generated  objects  superimposed 
upon  the  real  world,  where  those  objects  move  with  significant 
lag  but  the  real  world  does  nut,  will  not  provide  a  convincing 
illusion.  People  can  perceive  as  little  as  5  ms  of  lag  [IS],  and 
it  is  unlikely  that  the  combined  tracker  and  graphics  engine 
latency  will  be  below  that  anytime  soon.  Therefore, 
compensation  techniques  need  to  be  explored  |19][24].  If 
HMDs  are  to  achieve  their  potential  of  making  a  user  truly  feel 
immersed  inside  a  virtual  world,  significant  advances  in 
tracking  technologies  must  occur. 
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ABSTRACT 

Techniques  are  discussed  for  creating  a  rendered  view  into  a  3D 
scene,  interactively  based  on  the  locations  and  orienutions  of  the 
observer's  head  and  the  display  surface.  Stereoscopic  head- 
mounted  displays  (HMDs)  demonstrate  a  simplified,  special  case 
of  these  techniques,  because  the  eyes  and  monitors  move  in  uni¬ 
son,  A  largely  overlooked  class  of  interactive  displays  uses  the  rel¬ 
ative  positions  between  the  eyes  and  monitor  as  input.  These 
di^ys  can  be  stereo  or  monoscopic,  fixed  or  mobile,  and  the  ren¬ 
dering  process  should  incorporate  the  correct  perspective  distor¬ 
tion,  whid)  depends  on  the  locations  of  the  viewpoint(s)  and  the 
diqriay  monitor. 

Three  real-time  graphics  display  systems  were  prototyped  and 
examined:  a  high-resoiution  display  which  corrects  the  perspective 
projection  bas^  on  the  location  of  the  observer's  eye;  the  same 
di^y,  extended  to  -nodiiy  the  view  as  the  monitor  is  tilted  and 
swiveled;  and  a  handheld  LCD  display  which  can  be  freely  moved 
and  rotated  as  it  displsys  a  view  ba^  on  the  eye  and  monitor 
positions, 

A  simple  experiment  indicates  that  tracking  the  head  and  pro¬ 
viding  the  appropriate  view  improves  the  ability  to  pick  specific 
3D  Iwations  in  space  using  a  2D  display,  when  compared  to  a 
fixed  view  and  a  mouse-controlled  view. 

1.  INTROOUCnON 

In  the  everyday  world,  we  continually  shift  our  visual  attention 
fiom  place  to  place.  We  rotate  the  eyes  and  head,  scanning  differ¬ 
ent  regions  of  our  field  of  view.  In  action,  we  move  our  heads  to 
different  locations  in  space,  changing  our  viewpoints.  As  an 
observer  changes  bis  or  her  viewpoint,  objects  at  different  relative 
depths  appear  to  move  with  res^  to  each  other.  This  effect  is 
known  as  motion  parallax,  a  powerful  depth  cue  (S;7].  Changing 
one's  viewpoint  also  allows  an  observer  to  “look  arwnd"  objects, 
and  to  see  the  different  sides  of  objects,  obtaining  multiple  per- 
qwctive  views.  Perspective  and  motiai  parallax  are  both  monocu¬ 
lar  depth  cues;  the  sensation  of  depth  we  derive  from  them 
requires  only  one  eye,  and  thus,  requires  only  a  2D  display. 

Motion  parallax  can  be  used  to  increase  the  visual  correspon¬ 
dence  between  an  operator  and  a  remote  nt  synthetic  lelerobotic 
manipulator.  An  important  aspect  in  the  design  of  displays  and 
controls  is  creating  isomorphisms  between  the  local  and  remote 
operations  [8].  (See  Figure  I.)  For  example,  the  movement  of  a 
control  should  create  a  movement  of  the  conesponding  manipula¬ 
tor  in  the  same  direction,  of  the  same  apparent  magnitude,  on  the 
display.  .An  intelligent  display  should  provide  the  operator  with  a 
view  “corrected"  for  his  or  her  relative  position  to  tlie  display,  so 
that  the  displayed  manipulator  movements  always  appear  isomor¬ 
phic  with  her  or  his  own  movements.  An  uncorrected  view  requires 
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that  the  operator  remain  exactly  centered  in  front  of  the  display,  in 
order  to  remain  isomorphic.  One  way  to  provide  the  correct  view  is 
through  the  use  of  a  “true"  3D  display —  i.e.  an  autostereoscopic 
display,  which  does  not  require  viewing  aids  such  „s  glasses  [7]. 
Real-time  autostereoscopic  displays  are  (noblematic,  especiiilly 
concerning  bandwidth  and  computational  requirements.  For  tele¬ 
operations,  a  more  difficult  problem  is  the  development  of  the 
camera  required  to  record  the  spatial  information  for  an  autoste¬ 
reoscopic  display.  An  alternate  means  of  supplying  the  correct 
view  is  to  track  the  locations  of  the  eyes,  and  ^en  provide  the 
appropriat'  imagery.  For  a  teleoperator,  this  requires  that  the 
remote  cunera  be  servoed  to  the  operator's  head  movements.  In 
addition,  views  in  which  the  operator  moves  off-axis  from  the  cen¬ 
ter  of  the  monitor  require  that  the  displayed  image  be  distorted, 
either  by  translating  the  receptors  on  the  image-focus  plane  of  the 
camera,  providing  a  sub-image  from  a  wide  field-of-view,  or  by 
approximating  the  distortion  in  hardware/software.  The  use  of 
head-mounted  display  systems  bypasses  the  problem  of  distortion, 
since  the  eyes  do  not  move  relative  to  the  displays. 

The  modification  to  the  rendering  process  to  generate  off-axis 
perspective  projections  is  straightforward,  using  parameters 
alre^y  built  into  most  rendering  systems.  This  can  easily  be 
implemented  on  today's  real-time  rendering  woriutations.  through 
the  addition  of  any  number  of  tracking  methods.  Unfortunately, 
this  technique  has  been  largely  overl^ed.  despite  its  ease  of 
implementation  and  perceptual  benefiu.  It  is  important  that  the 


hi^re  i  i>o!nurphiMi\b  bcis^etrn  a  fcmoie  rubutit,  indnipuLitof  and  human 
opcniur  NUdburcb  ihai  appear  ^4114!  bet<Aeen  the  two  dia^ntmb  aiv,  m  fact* 
e4ual  Ihe  operator  Aaimot  put  hiv  of  her  hand  through  the  display,  obv  iously 
liuwevef.  the  use  of  a  head  mounted  or  a  hat  panel  display  allows  the  optical 
image  of  the  display  to  share  the  same  space  as  the  operator's  hand.  Adapted 
from  Sheridan  [8] 
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oonrect  perspective  distortion  be  incorporated  in  the  rendering  pro¬ 
cess.  This  is  not  a  type  of  “eye-in-hand”  or  “eye-on-head”  camera 
control  paradigm,  in  which  only  the  eyepoint  and  viewing  direc¬ 
tion  are  modified  [13].  Instead,  it  is  an  accurate  way  of  modeling 
the  visual  characteristics  of  a  3D  scene. 

Ihroto^  display  systems  were  developed  by  the  author  to 
examine  the  use  of  traddng  techniques  to  provide  an  accurate  per- 
q;)ective  projection,  based  on  the  relative  positions  of  the  viewer’s 
eyes,  the  di^ay  surface,  and  the  “real,”  inertial  reference  fiame. 
Qualitatively,  these  disidays  add  a  great  deal  of  depth  perception 
via  motion  parallax.  Ihe  ease  of  “look  around”  by  moving  the 
head  is  alio  a  very  attractive  feature.  Providing  for  a  mobile  dis- 
pliy  creates  even  greater  flexibility  for  “took  around"  and  the 
exploration  of  3D  scenes. 

A  simple  experiment  was  conducted  in  order  to  explore  the 
importance  of  isomorphic  imaging  on  perceiving  and  interacting 
with  three-dimensional  information.  Specifically,  the  experiment 
tested  how  many  times  a  subject  could  move  a  tiuee-dimensional 
cursor  to  a  three-dimensional  target  within  a  given  time  period 
while  viewing  a  2D  display.  Different  phases  of  the  experiment 
tested  the  subject’s  responses  when  the  view  was  fixed,  when  the 
view  could  be  interactively  changed  using  a  mouse,  and  when  the 
view  could  be  interactively  changed  by  moving  the  head. 

By  adding  tracked  objects  in  teal  space  which  have  matching 
computer  rqiresentations,  important  apfdkations  can  be  devel¬ 
oped.  For  examine,  for  medical  examination  and  surgical  planning 
and  assist,  compter  models  and  scanned  data  of  internal  body  fea- 
Urns  can  be  isomorphically  displayed  in  the  “patient  space,"  along 
with  tracked  surgical  instruments.  Similarly,  for  training  and 
repair,  teal  world  objects  can  be  augmented  with  computer  models 
to  gui^  instruct,  and  inform  the  user. 

2.BAJ(QROUND 

Head-mounted  displays  have  been  used  to  interactively  view  and 
explore  3D  data  and  scenes  for  a  number  of  years,  recently  gaining 
more  popularity  (3;11].  The  head  is  tracked,  and  imagery  is  gener¬ 
ated  appropriate  for  the  viewing  location  and  direction.  Boom- 
mounted  displays  provide  similar  functionality,  allowing  for  more- 
massive,  high-resolution  displays  and  greater  ease  of  use  in  certain 
situations  [6]. 

A  different  approach  was  taken  by  Fisher,  who  used  a  monitor 
fixed  in  place,  blowing  the  eyes  to  move  relative  to  the  display. 
Videodisc  technology  was  used  to  store  and  playback  multiple 
images  of  a  scene,  from  different  viewpoints.  The  observer’s  head 
was  tracked  and  the  appropriate  image  for  that  viewpoint  location 
was  displayed  on  a  CRT  display,  creating  what  Fisher  termed  view¬ 
point  dependent  imaging  [2]. 

About  the  same  time,  a  similar  system  was  demonstrated  by 
l^amond,  et  al,  usmg  real-time  image  generation.  Wire-frame  ren¬ 
dering  was  used  to  generate  the  perspective  projection  appropnate 
for  the  observer’s  eyepoint,  tracked  by  a  light  bulb  on  the  head 
using  a  video  camera.  The  authors  described  the  effect  of  their 
monoscopk  system  as  “dynamic  parallax"  [1]. 

The  above  technique  was  extended  by  Suetens,  et  al.,  to  provide 
a  stereoscopic  image,  using  electro-optical  shutter  glasses.  A  Pol- 
hemus  sensor  was  used  to  track  the  head,  and  a  stereoscopic  wire¬ 
frame  rendering  was  generated  in  real-time  [  10). 

Venolia  and  Williams  created  a  similar  system,  which  provided 
for  real-time  shaded  stereoscopic  imagery,  in  order  to  provide 
more  complex  imagery  than  could  be  generated  in  real-time,  they 
employed  a  “viewpoint  array”  similar  to  Fisher’s  approach.  The 
precomputed  images  were  stored  in  memory  and  were  displayed 
based  on  the  observer’s  horizontal  location  [12]. 

This  paper  provides  more  details  than  the  above  references  on 
the  transformations  used  to  generate  viewpomt  dependent  images. 
It  also  extends  this  technique  to  allow  for  a  mobile  display  surface 
By  tracking  both  the  head  and  monitor,  greater  flexibility  is 


Figure  2:  Itie  penpectives  and  tizea  of  Ihe  20  projecUons  of  3D  objecta 
change  aa  the  viewpoint  moves. 


achieved  in  the  exploration  of  3D  information,  while  retaining  an 
isomorphic  correspondence  between  the  synthetic  space  and  Ihe 
real,  laboratory  space. 

3.  FIXED-DISPLAY  MONOCULAR  SYSTEM 

Figure  2  shows  an  exampl*  of  how  the  perspective  projection  of  a 
3D  object  is  modified  as  the  view  changes.  Points  which  lie  at  the 
same  depth  as  the  screen  are  the  only  ones  which  do  not  “move” 
relative  >  the  screen  as  the  viewpoint  changes.  Figure  3  depicts  a 
stereoscopic,  viewpoint  dependent  display.  ’The  display  screen  acts 
like  a  “window"  into  the  three-dimensional  space,  cutting  off  the 
view  of  objects  which  lie  outside  the  current  viewing  volume. 
Objects  “behind”  the  screen  are  cut  off  just  as  we  expect  a  real 
window  to  obscure  objects.  Objects  in  front  of  the  screen  and  out¬ 
side  the  viewing  volume  are  also  clipped.  However,  this  is  not  a 
phenomenon  we  are  familiar  with  from  our  everyday  experiences. 
The  “closer"  objects  are  seemingly  obscured  by  the  screen,  “fur¬ 
ther”  back.  This  is  often  called  a  “window  violation”  and  can  sig- 
niricantly  disrupt  the  depth  perception  of  the  scene,  whether  using 
a  stereoscopic  or  monoscopic  display. 

To  generate  a  viewpoint  dependent  image,  a  normal  perspective 
rendering  takes  place,  using  a  “window”  onto  the  view-plane, 
which  is  off  center  from  the  vector  which  passes  through  the  eye- 
point  and  is  normal  to  the  display  surface.  Figure  4  shows  an 
example  viewing  setup.  The  window  center  rendering  parameter  is 
used  to  shift  the  area  to  be  rendered  away  from  the  view  normal 
|4;9). 

The  inonitoi ’s  and  the  observer ’s  locations  and  dimensions  are 
tracked  and  located  m  the  rendering  “world-space”  with  the  30 
objects.  The  eye  location  is  established  as  a  constant  translational 
offset  within  the  head  tracking  coordinate  frame.  A  coordinate 
frame  is  established  for  the  monitor,  which  has  its  origin  at  the 
center  of  the  display  surface.  Matters  are  simplilied  if  the  coordi¬ 
nate  axes  are  aligned  with  the  display  normal  and  the  "vertical” 
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H|Ui«  3;  An  off-axii  view  onto  t  fteteotcopic,  viewpoint  dependent  ditpUy. 
Hie  lerecn  *cti  u  « ‘Vindow”  into  the  iptce —  clipping  object!  both  in  the 
foregtoond  end  background. 


tuA  “horizonUl"  directions,  such  as  the  coordinate  frame  depicted 
in  Figure  4. 

The  viewing  parameters  are  set  as  follows;  the  eyepoint  is  set  to 
the  tracked  location  of  the  eye,  in  world  space;  the  view  normal  is 
set  to  the  "inwards"  monitor  normal,  rotated  (and  not  translated) 
into  world  apace;  the  view  up  is  set  to  the  “vertical"  monitor  vector, 
rotated  into  world  space;  the  window  half-size  is  w  to  one  half  of 
the  monitor's  actual  size;  the  view  distance  is  set  to  the  distance  at 
the  eye  from  the  monitor  plane,  easily  attainable  by  transforming 
the  worid'space  location  of  the  eye  into  the  monitor's  coordinate 
frame,  and  using  the  “height”  of  the  eye,  along  the  display  normal 
(-normal  •  eye)',  and  the  window  center  is  set  to  offset  the  eye's 
position  relative  to  the  display  surface's  center:  (-horiz»eye, 
-vertical •  eye).  These  calculations  assume  that  the  display  sur¬ 
face  is  planar. 

This  system  was  implemented  using  a  Hewlett  Packard  Model 
833  UNIX  workstation,  with  a  'Turbo-SRX"  real-time  polygonal 
tendering  system  (performance  approximately  12  MVS  CPU, 
38,0(X)  shadkl  triangles  per  second).  A  Polhemus  sensor  was  used 
to  track  the  head.  Ihe  display  surface  is  fairly  large  (13"  x  11"), 
with  a  resolution  of  1280x1024. 

This  is  the  dis[day  system  used  in  the  experiment  described  in 
Section  6.  The  system  has  been  used  to  view  3D  objects  and  ani- 
mtUions.  qualitatively  enhancing  3D  perception  significantly. 

4.  MOBjLE  DISPLAY  MONOCUUR  SYSTEM 

By  tracking  the  position  and  orientation  of  the  display  monitor,  we 
can  accommodate  changes  in  its  location  in  the  rendering  process, 
so  that  isomorphism  is  retained  between  the  imagery  and  the  real- 
world.  The  monitor  can  be  moved  to  attain  a  better  view  of  the 
data,  or  sim^  ly  shifted  to  a  more  comfortable  viewing  position, 
without  losing  the  correspondence  to  the  real  world  coordinates. 

The  fixed-display  method  is  extended  simply  by  tracking  the 
monitor,  and  adding  the  appropriate  transformations.  A  monitor 
coordinate  frame  is  established  as  above,  only  m  this  case,  the 
monitor  frame  is  a  “child"  of  the  display's  traclring  device  coordi¬ 
nate  frame,  rotating  to  the  normalized  monitor  space,  and  translat¬ 
ing  to  the  display  center. 

Two  mobile  display  systems  were  implemented.  The  first  used 
the  high-resolution  HP  display,  allowing  it  to  tilt  and  swivel.  The 
display  could  be  translated  as  well,  but  it  is  quite  bulky.  The  Polhe¬ 
mus  sensor  was  mounted  on  a  “boom,"  away  from  the  EM  field  of 
the  CRT.  It  is  an  important  issue  to  mount  llie  sensor  us  close  as 
possible  to  the  monitor's  center,  however,  since  error  and  noise  in 
the  orientation  sensing  will  be  amplified  by  distance.  Movement  ot 
the  monitor  proved  useful  for  adjustmg  llie  view,  and  for  exploring 


Figure  4:  Shifting  the  “window  center'’  based  on  the  position  of  the  eye  gen¬ 
erates  the  appropriate  perspective  for  that  viewpoint.  The  “window  center’’ 
parameter  is  used  in  the  rendering  pipeline  to  control  a  shear  transfonnation, 
which  aligns  the  center-line  of  the  viewing  pyramid  with  the  z-axis,  in  the 
coordinate  system  shown  here. 


the  data  without  losing  the  correspondence  between  object  space 
and  real  space.  The  display  was  quite  “jittery,''  unfortunately,  due 
to  tracking  noise.  However,  a  mode  can  be  employed  to  deactivate 
monitor  tracking  when  it  is  not  being  moved,  to  reduce  the  overall 
noise.  Ideally,  a  low-noise  tracking  system  would  be  employed, 
such  as  measuring  the  joint  angles  in  the  monitor  base. 

The  second  moNIe  display  used  a  small  (2.S''xl.8'’),  hand-held 
LCD  screen,  tracked  by  a  Polhemus,  which  could  be  freely  moved 
in  space.  This  system  was  interesting  due  to  its  high  mobility —  the 
user  could  quickly  explore  3D  data,  from  many  different  positions 
and  orientations.  The  small  screen  is  certainly  limiting,  but  the 
results  indicate  that  larger  screens  are  worth  exploring  in  this  con¬ 
text. 

5.  STEREOSCOPIC  SYSTEM 

The  extension  of  the  above  systems  to  include  stereo  is  very  sim¬ 
ple.  The  second  eyepoint  is  located  in  world-space  in  the  same 
manner  as  the  first  eye,  with  a  different  translational  shift  from  the 
tracked  point  (e.g.  the  polhemus  sensor).  A  second  tendering  is 
generated  from  the  second  viewpoint,  and  the  left  and  right  eye 
images  are  displayed  in  the  appropriate  manner  for  the  type  of  ste¬ 
reoscopic  display  used. 

A  tracking  device  should  be  used  which  detects  orientation,  as 
well  as  position,  so  that  the  two  eyes  are  accurately  located  in 
space.  In  addition,  the  “roll''  of  the  head  can  be  detected,  as  it  tilts 
towards  the  sides,  and  the  stereo  imagery  is  aulunatically  offset  in 
the  appropriate  direction.  This  can  be  especially  important  when 
the  display  is  mobile,  smee  it  may  take  on  unusual  viewing  config¬ 
urations.  The  stereoscopic  display  must  be  able  to  support  these 
types  of  rotations —  for  exumple,  some  polarized  systems  use  lin¬ 
ear  polarization,  which  will  not  allow  “rolls.” 

Due  to  a  lack  of  equipment,  we  have  not,  as  yet,  experimented 
with  a  non-HMD  stereoscopic  display. 

6.  EXPERIMENT 

An  informal  experiment  was  conducted  to  test  the  effect  of  view¬ 
point  dependent  control  on  the  speed  required  to  manually  locate  a 
three  dimensional  target  location.  The  fixed-monitor,  moving 
viewpoint  system  was  used,  as  described  in  Section  3.  A  second 
Polhemus  sensor  was  used  to  track  the  hand  location. 

Tlie  experiment  progresses  as  follows;  A  red  cube,  2  cm  per 
side  (in  modeling  space  and  “real"  space),  appears  on  the  display 
to  act  as  the  target.  A  blue  cube,  also  2  cm  per  side,  is  displayed, 
and  acts  as  a  cutsor,  trackmg  the  motions  of  the  hand.  The  cursor, 
in  the  depicted  3D  space,  moves  with  the  same  magnitude  and 
directions  as  the  tracked  hand,  simply  offset  by  a  translation.  The 
task  IS  to  align  the  cursor  cube  to  the  target  cube  (translation  only. 
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Bgure  S:  The  experiment  results  from  the  **expert*'  subjects,  the  different 
phuet  of  the  experiments  are  shown  across  the  plot  on  the  x  axis*  and  the 
number  of  successful  target  matches  is  shown  on  the  y  axis.  The  mean  score 
is  indiated  by  the  central  horizontal  bar.  The  line  boxes,  partially  overlap¬ 
ping  the  grey  boxes,  indicate  the  median  2S%-7S%  range  of  the  scores. 
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no  orientation)  within  a  given  distance  tolerance  (I  cm).  Unce 
aligned,  the  target  moves  to  a  new  random  location  within  the 
workspace.  The  subject  is  instructed  to  reach  the  target  as  many 
timet  as  he  or  she  can,  within  the  given,  fixed  time  limit. 

There  are  three  phases  of  the  experiment:  one  in  which  the  view 
it  fixed  and  unchanging,  one  in  which  the  viewpoint  can  be  moved 
using  a  mouse,  and  one  in  which  the  viewpoint  it  directly  con¬ 
trolled  by  head  movements. 

Eleven  subjects  were  run  through  the  experiment,  four  novices 
and  seven  experts  (subjects  familiar  with  real-time  rendering  and 
tracking  systems).  Figure  5  shows  the  data  &om  the  expert  sub¬ 
jects.  Tlie  novice  subjects  had  the  lowest  scores,  and  their  results 
were  more  widely  varying  than  the  experts.  In  general,  perfor¬ 
mance  did  inaease  un^r  view  oint  dependent  control,  although 
not  dramatically.  Use  of  the  mouse  generally  decreased  the  score. 

(Qualitatively,  the  subjects  preferred  the  viewpoint  dependent 
control,  especially  as  compared  to  the  mouse  control,  which  most 
found  confusing.  Some  subjects  considered  the  “jitter"  in  the  view, 
due  to  the  noise  &om  the  polhemus  tracker,  to  be  disuacting;  oth¬ 
ers  thought  it  helped  give  a  better  sense  of  the  depth,  due  to  the 
small  amount  of  resulting  motion  parallax.  This  effect  could  be 
tested  experimentally. 


7,  DISCUSSION 

I^viding  renderings  based  on  the  true  viewing  parameters  of  the 
observer  and  dis|4ay  has  proven  to  enhance  the  30  perception  of 
real-time  graphics,  in  our  applications  and  experiments.  Qualita¬ 
tively,  these  ^splays  significantly  enhanced  depth  perception  via 
motion  parallax,  and  the  ability  to  “look  around"  objects  and 
explore  the  30  scene,  using  intuitive  motions.  These  displays  gen¬ 
erated  significant  interest  and  excitement  in  the  lab. 

The  mobile  LCO  prototype  display  is  too  small  to  be  of  use  for 
many  applications,  but  it  demonstrates  very  intriguing  viewmg 
qualities.  The  objects  displayed  on  it  are  convincingly  30,  not  so 
much  in  that  they  “look"  30,  but  rather,  in  that  the  30  nature  of  the 
data  is  so  easy  to  explore. 

There  are  interesting  differences  between  these  displays  and 
HMOs.  These  displays  are  particularly  non-intrusive  and  non-dis¬ 
orienting,  since  most  of  the  eyes’  POV  remains  within  the  real 
world,  and  visual  jitter  does  not,  therefore,  strongly  conflict  with 
the  vestibular  system.  Higher  effective  resolutions  are  achieved, 
since  the  pixels  occupy  smaller  visual  angles. 

Tracking  noise  is  currently  a  problem  in  these  prototypes,  espe¬ 
cially  in  the  mobile -monitor  systems.  Tracking  systems  are  avail¬ 
able  which  generate  significantly  lower  noise  than  Polhemus 
trackers.  In  particular,  articulated  anns  could  be  used  to  measure 
monitor  positions  with  high  accuracy  and  low  noise. 

The  experiment  helped  confinii  the  utihty  of  viewpoint  depen¬ 
dent  imaging  in  3D  picking  operations.  Further  experiments 
should  be  designed  in  which  s  more  complete  understandmg  of  the 
3D  scene  is  required,  perhaps  adding  orientation  criteria  and  more 


complex  environments.  In  this  experiment,  the  task  seemed  too 
simple  and  quick  to  execute,  in  that  the  subjects  would  not  take  the 
extra  time  to  obtain  multiple  views  unless  it  was  required.  An 
experiment  which  "rewards”  visual  exploration  would  be  more 
appropriate  to  investigate  the  perceptual  benefits  derived  from 
interactive  display  techniques. 
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Device  Synchronization  Using  an  Optimai  Linear  Fiiter 


Martin  Friedmann,  Thad  Stamer  and  Alex  Pentland  ^ 


Abstract 

In  order  to  be  convincing  and  natural,  interactive  graphics  appUca* 
tiont  mutt  correctly  synchronize  user  motion  with  rendered  graph¬ 
ics  and  sound  output  We  present  a  solution  to  the  synchronization 
problem  that  it  bated  on  optimal  estimation  methods  and  fixed- 
lag  dataflow  techniques.  A  method  for  discovering  and  correcting 
fiction  errm  using  a  generalized  likelihood  approach  it  also 
presented.  And  finally,  MutkWorld,  a  simulated  environment  em¬ 
ploying  these  Ideu  it  described. 

CR  CatogoriM  and  Subjaet  Daacriptora  :  1.3.6  (Computer 
Oraphict];  Methodology  and  Ihchniquet  •  Interaetion  Techniquev, 
D.2.2  [Software  Engineering]:  Tools  and  Ibchniquet  -  User  Inter- 
faces 

AddMtonal  Kayworda:  Real-time  graphics,  artificial  reality,  in¬ 
teractive  graphics.  Kalman  filtering,  device  synchronization. 

1  Introduction 

In  order  to  be  convincing  and  natural,  interactive  grapbkt  applica- 
tionsmustconecUy  synchronize  user  motion  with  ren^ied  graphics 
and  sound  output.  exact  synchronization  of  user  motion  and 
rendering  it  critical:  lags  greater  than  100  msec  in  the  rendering  of 
hand  motion  can  cause  users  to  restrict  themselves  to  slow,  careful 
movements  while  diKrepancies  between  bead  motion  and  rendering 
can  cause  motion  sickness  (3;  S].  In  systems  that  generate  sound, 
tnwli  delays  in  sound  output  can  confuse  even  practiced  users. 
This  paper  proposes  a  suite  of  methods  for  accurately  predicting 
sensor  position  in  order  to  more  closely  synchronize  processes  in 
distributed  virtual  environments. 

Problems  in  synchronization  of  user  motion,  tendering,  and 
sound  arise  from  three  basic  causes.  The  first  cause  is  noise  in 
the  sensor  measurements.  The  second  cause  is  the  length  of  the 
processing  pipeline,  that  is,  the  delay  introduced  by  the  sensing  de¬ 
vice,  the  CPU  time  required  to  calculate  the  proper  response,  and 
the  time  spent  rendering  output  images  or  generating  appropriate 
sounds.  The  third  cause  is  unexpected  interruptions  such  as  net¬ 
work  contention  or  operating  system  activity.  Because  of  these 
factors,  using  the  raw  output  of  position  sensors  leads  to  noticeable 
lags  and  other  discrepancies  in  output  synchronization. 
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Unfortunately,  most  interactive  systems  either  use  raw  sensor 
positions,  or  they  make  an  ad-hoc  attempt  to  compensate  for  the 
fixed  delays  and  noise.  A  typical  method  for  compensation  averages 
current  sensormeasurements  with  previous  measurements  to  obtain 
a  smoothed  estimate  of  position.  The  smoothed  measurements  are 
then  differenced  for  a  crude  estimate  of  the  user’s  instantaneous 
velocity.  Finally,  the  smoothed  position  and  instantaneous  velocity 
estimates  are  combined  to  extrapolate  the  user's  position  at  some 
fixed  interval  in  the  future. 

Problems  with  this  approach  arise  when  the  user  either  moves 
quickly,  so  that  averaging  sensor  measurements  produces  a  poor 
estimate  of  position,  or  when  the  user  changes  velocity,  so  that 
the  predicted  position  overshoots  or  undenboots  the  user’s  actual 
position.  As  a  consequence,  users  are  forced  to  make  only  skw, 
deliberate  motions  in  order  to  maintain  the  illusion  of  reality. 

We  present  a  solution  to  these  problems  based  on  the  ability  to 
more  accurately  predict  future  user  positions  using  an  optimal  linear 
estimator  and  on  the  use  of  fixed-lag  dataflow  techniques  that  are 
well-known  in  hardware  and  operating  system  design.  The  ability 
to  accurately  predicl  future  positions  eases  the  need  to  shorten  the 
processing  pipeline  because  a  fixed  amount  of  “lead  time’’  can  be 
allotted  to  each  output  process.  For  example,  the  positions  fed  to 
the  rendering  process  can  reflect  sensor  measurements  one  frame 
ahead  of  time  so  that  when  the  image  is  rendered  and  displayed, 
the  effect  of  synchrony  is  achieved.  Consequently,  unpredictable 
system  and  network  interruptions  ate  invisible  to  the  user  as  long  as 
they  are  shorter  than  the  allotted  lead  time. 


2  Optimal  Estimation  of  Position  and 
Velocity 

At  the  core  of  our  technique  is  the  optimal  linear  estimation  of  fu¬ 
ture  user  position.  To  accomplish  this  it  is  necessary  to  consider  the 
dynamic  properties  of  the  user’s  motion  and  of  the  data  measure¬ 
ments.  The  Kalman  filter  [4]  is  the  standard  technique  for  obtaining 
optimal  linear  estimates  of  the  state  vectors  of  dynamic  models  and 
for  predicting  the  state  vectors  at  some  later  time.  Outputs  from  the 
Kalmar  filter  are  the  maximum  likelihood  estimates  for  Gaussian 
noises,  and  are  the  optimal  (weighted)  least-squares  estimates  for 
non-Gaussian  noises  (2). 

In  our  particular  application  we  have  found  that  it  is  initially 
sufficient  to  beat  only  the  banslabonal  components  (the  i,  y,  and  z 
coordinates) output  by  the  Polhemus  sensor,  and  to  assume  mdepen- 
dent  observation  and  acceleration  noise.  In  this  section,  therefore, 
we  will  develop  a  Kalman  filter  that  estimates  the  position  and  ve¬ 
locity  of  a  Polhemus  sensor  fur  this  simple  noise  model.  Rotabons 
will  be  addressed  in  the  following  section. 
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2.1  The  Kalman  FiHer 

Let  us  define  a  dynamic  process 

X*+,  =  f(X*,A<)  +  C{t)  (1) 

where  the  function  f  models  the  dynamic  evolution  of  state  vector 
Xk  at  time  k,  and  let  us  define  an  observation  process 

Yik=h(X*,  At) +  »?{<)  (2) 

where  the  sensor  observations  Y  are  a  function  h  of  the  state  vector 
and  time.  Both  (  and  are  white  noise  processes  having  known 
spectral  density  matrices. 

In  our  cate  the  state  vector  Xk  consists  of  the  true  position, 
velocity,  and  acceleration  of  the  Polhemus  sensor  in  each  of  the  x, 

y,  and  s  coordinates,  and  the  observation  vector  Y *  consists  of  the 
Polhemus  position  readings  for  the  x,  y,  and  t  coordinates.  The 
function  f  will  describe  the  dynamics  of  the  user's  movements  in 
terms  of  the  state  vector,  i.e.  how  the  future  position  in  x  is  related 
to  current  position,  velocity,  and  acceleration  in  x,  y,  and  x.  The 
observation  function  h  describes  the  Polhemus  measurements  in 
terms  of  the  state  vector,  ie.,  how  the  next  Polhemus  measurement 
is  related  to  current  position,  velocity,  and  acceleration  in  x,  y,  and 

z. 

Using  Kalman's  result,  we  can  then  obtain  the  optimal  linear 
estimate  itk  of  the  state  vector  Xs  by  use  of  the  following  Kalman 
filter. 

^*  =  x;+K*(Ys-h(x:,.M)  (3) 

provided  that  the  Kalman  gain  matrix  Ks  is  chosen  correctly  [4]. 
At  each  time  step  k,  the  filter  algorithm  uses  a  state  prediction  XI, 
an  enor  covariance  matrix  prediction  P^,  and  a  sensor  measure¬ 
ment  Yji  to  determine  an  optimal  linear  state  estimate  %k,  error 
covariance  matrix  estimate  Pk,  and  predictions  for 

the  next  time  step. 

The  prediction  of  the  state  vector  Xk.^|  at  the  next  time  step  is 

obtained  by  combining  the  optimal  state  estimate  ^k  and  Equation 
1: 

X;+,  =:^*-l-f(^*.A0A(  (4) 

In  our  graphics  application  this  prediction  equation  is  also  used 
with  larger  times  steps,  to  predict  the  user's  fuUire  position.  This 
prediction  allows  us  to  maintain  synchrony  with  the  user  by  giving 
us  the  lead  time  needed  to  complete  rendering,  sound  generation, 
and  so  forth. 

2.1 .1  Calculating  The  Kalman  Gain  Factor 

The  Kalman  gain  matrix  Ks  minimizes  the  error  covariance 
matrix  Pt  of  the  error  =  Xs  -  Xs.  and  is  given  by 

Ks  =  p:h/{h*p:h/-r)-‘  (5) 

where  It  =  E(?/{t)»/(t)^)  is  the  n  x  n  observation  noise  special 
density  matrix,  and  the  mauix  Hs  is  the  local  linear  approximation 
to  the  observation  function  h, 

(H*],,  =  dhtjdxj  (6) 

evaluated  at  X  =  XI . 

Assuming  that  the  noise  characteristics  are  constant,  then  the 
optimizing  error  covariance  matrix  Pk  is  obtained  by  solving  the 
Riccali  equation 

o=p:  =  FkPt  +  p:Fi-p:HrR-'HkP:  +  Q  (?) 

where  Q  =  E({(t){(t)^]  is  the  n  x  n  spectral  density  matrix  of  die 
system  excitation  noise  and  is  the  local  linear  approximation 
to  the  state  evolution  function  f , 

(F.),;  =  (8) 


evaluated  at  X  = 

More  generally,  the  optimizing  error  covariance  matrix  will  vary 
with  time,  and  must  also  be  estimated.  The  estimate  covariance  is 
given  by 

^*  =  (I-KfcH*)P:  (9) 

From  this  the  predicted  error  covariance  matrix  can  be  obtained 

P:+,  +  G  (10) 

where  ik  is  known  as  the  state  transition  matrix 

»*  =  (I  +  F*At)  (11) 

22  Ectlmatlon  of  Displacement  and  Velocity 

In  our  graphics  application  we  use  the  Kalman  filter  described  above 
for  the  estimation  of  the  displacements  Px,  Py,  and  P«,  the  veloc¬ 
ities  Vx,  Vy,  and  Vt,  and  the  accelerations  Ax,  Ay,  and  of 
Polhemus  sensors.  The  state  vector  X  of  our  dynamic  system  is 
therefore  (Px,  V,,  A*,  Py,  Vy,Ay,  P„V,,  A,)^,  and  the  state  evo¬ 
lution  function  is 


f(X,At)  = 


Vx-l-Axf  • 
Ax 
0 

Vy  -f  Aj,y 
Ay 
0 

V.-bA.^ 

A. 

0 


(12) 


The  observation  vector  Y  will  be  the  positions  Y  ss 
(Pi,  Pi,  P»)*  that  are  the  output  of  the  Polhemus  sensor.  Given 
a  state  vector  X  we  predict  the  measurement  using  simple  second 
order  equations  of  motion; 


h(X,A0  = 


P,  +  V,At-bA,^  ■ 
Py  -F  VyAt  +  Ay  ^ 

Pt  -b  V,A(  -f  A  j  ^  . 


(13) 


Calculating  the  partial  derivatives  of  Equations  6  and  8  we  obtain 


0  1  f 
0  1 
0 

0  1  f 
0  1 
0 


L 


(14) 


(15) 


Finally,  given  the  state  vector  Xk  at  time  k  we  can  predict  the 
Polhemus  measurements  at  time  t  -i-  At  by 

Ylt+At  =  h(Xk,At)  (16) 

and  the  predicted  state  vector  at  time  k  -t-  At  is  given  by 

Xji+Ai  =  X;i -I- f(X*,  At)At  (17) 
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2^.1  ThcNoiM  Model 

We  have  experimentally  developed  a  noise  model  for  user  mo- 
dons.  Allbou|h  our  noise  model  is  not  verifiably  optimal,  we  0nd 
the  results  to  quite  sufficient  for  a  wide  variety  of  head  and  hand 
trackinf  applications.  The  system  excitation  noise  model  ( is  de¬ 
signed  to  compensate  for  huge  velocity  and  acceleration  changes; 
we  have  found 

({tf  s  I  I  20  63  1  20  63  1  20  63  ]  (18) 

(where  <3  s  ((<)((f)^)  provides  a  good  model.  In  other  words, 
we  expect  and  allow  for  positions  to  have  a  standard  deviation  of 
Imtn,  velocities  20mm/see  and  accelerations  63lntn/see^  The 
observation  noise  is  expected  to  be  much  lower  than  the  system 
excitation  noise.  The  spectral  density  matrix  for  observation  noise 
is  =s  i}(t)i}(()^;  we  have  found  that 

^(tf  =  (  .25  .25  .25  ]  (19) 

provides  a  good  model  for  the  Polhemus  sensor. 

2.3  Exptrlnwntal  Resulte  and  Compariaon 

Figure  I  shows  the  raw  output  of  a  Polhemus  sensor  attached  to  a 
drumstick  playing  a  musk^  flourish,  together  with  the  output  of 
our  Kalman  Alter  predkting  the  Polhemus's  position  \/30th  of  a 
second  in  the  future. 

As  can  be  seen,  the  predktkn  is  generally  quite  accurate.  At 
points  of  high  acceleration  a  certain  amount  of  overshoot  occurs; 
such  problems  are  inuinsk  to  any  predktkn  method  but  can  be 
miniinized  with  more  complex  m^els  of  the  sensor  noise  and  the 
dynamics  of  the  user’s  movements. 

Figure  2  shows  a  higher-resolution  verskn  of  the  same  Polhemus 
signal  with  the  Kalman  filler  output  overlayed.  Predictions  for  1/30, 
1/15,  and  1/10  of  a  second  in  the  future  are  shown.  For  compari¬ 
son,  Figure  3  shows  the  performance  of  the  prediction  made  f^rom 
simple  smoothed  local  position  and  velocity,  as  described  in  the  in¬ 
troduction.  Again,  predictions  for  1/30, 1/15,  and  1/lOof  asecond 
in  the  future  are  shown.  As  can  be  seen,  the  Kalman  filter  provides 
a  more  reliable  predictor  of  future  user  position  than  the  commonly 
used  method  of  simple  smoothing  plus  velocity  prediction. 


Height  Imm) 


Figure  2;  Output  of  Kalman  Alter  for  various  kad  tintes 
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Figure  3:  Output  of  commonly  used  velocity  prediction  method. 


3  Rotations 

With  the  Polhemus  sensor,  the  above  scheme  can  be  directly  ex¬ 
tended  to  filter  and  predict  Euler  angles  as  well  as  translations. 


59 


However  with  some  sensors  it  is  only  possible  to  read  out  instant- 
by-instant  incremental  rotations.  In  this  case  the  absolute  rotational 
state  must  be  calculated  by  integration  of  these  incremental  rota¬ 
tions,  and  the  Kalman  filter  fonnulation  must  altered  as  follows  [1], 
See  also  [6]. 

Let  p  the  incremental  rotation  vector,  and  denote  the  rotational 
velocity  and  acceleration  by  d  and  a.  The  rotational  acceleration 
vector  a  is  the  derivative  of  d  which  is,  in  turn,  the  derivative  of 
p,  but  only  when  two  of  the  components  p  are  exactly  zero  (in 
some  frame  to  which  both  p  and  d  are  referenced).  For  sufficiently 
small  rotations  about  at  least  two  axes,  d  is  approximately  the  time 
derivative  of  p. 

Fbr  3D  tracking  one  cannot  generally  usume  small  absolute  rota¬ 
tions,  so  an  additional  representation  of  rotation,  the  unit  quaternion 

• 

q  and  its  rotation  submatrix  R,  is  employed.  Let 


be  the  unit  quaternion.  Unit  quaternions  can  be  used  to  describe  the 
rotation  of  a  vector  v  through  an  angle  ^  about  an  axis  A,  where  A 
is  a  unit  vector.  The  unit  quaternion  associated  with  such  a  rotation 
hu  scalar  part 

50  =  sin  (^/2)  (21) 

and  vector  part 

^  S  j  =ilcos(^/2).  (22) 

Note  that  every  quaternion  defined  this  way  is  a  unit  quaternion. 

By  convention  q  is  used  to  designate  the  rotation  between  the 
global  and  local  coordinate  frames.  The  definition  is  such  that  the 
oithonoimal  matrix 

R  =  (23) 

‘  90+9?  2(9i0j-M))  2(9191 909:) 

2(9192  +  9093)  9o-9?+92~  93  2  (9203  -9091) 

.  2(9193-9092)  2(9293  +  9091)  9o-9?-  92  +  93  . 

transforms  vectors  expressed  in  the  local  coordinate  frame  to  the 
corresponding  vectors  in  the  global  coordinate  frame  according  to 

Vglotol  =  RV(oca(.  (24) 

In  dealing  with  incremental  rotations,  the  model  typically  as¬ 
sumes  that  accelerations  are  an  unknown  “noise"  input  to  the  system, 
and  that  the  time  intervals  are  small  so  that  the  accelerations  at  one 
time  step  are  close  to  those  at  the  previous  time  step.  The  remain¬ 
ing  states  result  from  integrating  the  accelerations,  with  corrupting 
noise  in  the  integration  process. 

The  assumption  that  accelerations  and  velocities  can  be  integrated 
to  obtain  the  global  rotational  state  is  valid  only  when  Pi,  is  close 
to  zero  and  pj^^,  remains  small.  The  latter  condition  is  guaranteed 
with  a  sufficiently  small  time  step  (or  sufficiently  small  rotational 
velocities ).  Ihe  condition  p,,  =  0  is  established  at  each  time  step  by 

defining  p  to  be  a  coirection  to  a  nominal  (absolute)  rotation,  which 

0 

is  maintained  externally  using  a  unit  quaternion  q  that  is  updated  at 
each  time  step. 

4  Unpredictable  Events 

We  have  tested  our  Kalman  filter  synchronization  approach  using 
a  sunulated  musical  environment  (described  below)  in  which  we 
track  a  drumstick  and  simulate  the  sounds  of  virtual  drums.  For 
smooth  motions,  the  drumstick  position  is  accurately  predicted,  so 
that  sound,  sight,  and  motion  are  accurately  synchronized,  and  the 
user  experiences  a  strong  sense  of  reality. 


The  main  difficulties  that  arise  with  this  approach  derive  from 
unexpected  large  accelerations,  which  produce  overshoots  and  sim¬ 
ilar  errors.  It  is  important  to  note,  however,  that  overshoots  are 
not  a  problem  as  long  the  drumstick  is  far  from  the  drum.  In  these 
cases  the  ovenhoots  simply  exaggerate  the  user's  motion,  and  the 
perception  of  synchrony  persists.  In  fact,  such  overshoots  seem 
generally  to  enhance,  not  degrade,  the  user’s  impression  of  reality. 

The  problem  occurs  when  the  predicted  motion  overshoots  the 
true  motion  when  the  drumstick  is  near  the  drumhead,  thus  causing 
a  false  collision.  In  this  case  the  system  generates  a  sound  when  in 
fact  no  sound  should  occur.  Such  errors  detract  noticeably  from  the 
illusion  of  reality. 

4.1  Correcting  Prediction  Errore 

How  can  we  preserve  the  impression  of  reality  in  the  case  of  an 
overshoot  causing  an  incorrect  response?  In  the  case  of  simple 
responses  like  sound  generation,  the  answer  is  easy.  When  we 
detect  that  the  user  hu  changed  direction  unexpectedly  —  that  is, 
that  an  overshoot  hu  occurred — then  we  simply  send  an  emergency 
message  aborting  the  sound  generation  process.  As  long  u  we  can 
detect  that  an  overshoot  hu  occurred  before  the  sound  is  “releued," 
there  will  be  no  error. 

This  solution  can  be  implemented  quite  generally,  but  it  depends 
critically  upon  two  things.  The  first  is  that  we  must  be  able  to  very 
quickly  substitute  the  correct  response  for  the  incorrect  response. 
The  second  is  that  we  must  be  able  to  accurately  delect  that  an 
overshoot  hu  occurred. 

In  the  cue  of  sound  generation  due  to  an  overshoot.  It  is  euy  to 
subslimte  the  correct  response  for  the  incorrect,  because  the  correct 
response  is  to  do  nothing.  Mote  generally,  however,  when  we  de¬ 
tect  that  our  motion  prediction  wu  in  error  we  may  have  to  perform 
some  quite  complicated  alternative  response.  To  maintain  synchro¬ 
nization,  therefore,  we  must  be  able  to  detect  possible  trouble  spots 
beforehand,  and  begin  to  compute  all  of  the  alternative  responses 
sufficiently  far  ahead  of  time  that  they  will  be  available  at  the  critical 
instant. 

The  strategy,  therefore,  is  to  predict  user  motion  just  u  before, 
but  that  at  critical  junctures  to  compute  several  alternative  responses 
rather  than  a  single  response.  When  the  instant  arrives  that  a  re¬ 
sponse  is  called  for,  we  can  then  choose  among  the  available  re¬ 
sponses. 

4.2  Detecting  Prediction  Errore 

Given  that  we  have  computed  alternative  responses  ahead  of  time, 
and  that  we  can  detect  that  a  prediction  error  hu  occurred,  then  we 
can  make  the  correct  response.  But  how  are  we  to  detect  which  of 
(possibly  many)  alternative  responses  are  to  be  executed? 

The  key  insight  to  solving  this  detection  problem  is  that  if  we 
have  the  correct  dynamic  model  then  we  will  always  have  an  optimal 
linear  estimate  of  the  drumstick  position,  and  there  should  be  nothing 
much  better  that  we  can  to  do.  The  problem,  then,  is  that  in  some 
cues  our  model  of  the  event’s  dynamics  does  not  match  the  true 
dynamics.  For  instance,  we  normally  expect  accelerations  to  be 
small  and  uncorrelaied  with  position.  However  in  some  cues  (for 
instance,  when  sharply  changing  the  pace  of  a  piece  of  music)  a 
drummer  will  apply  large  accelerations  that  are  exactly  correlated 
with  position. 

'Ihe  solution  is  to  have  several  models  of  the  drummer’s  dynam¬ 
ics  tunning  in  parallel,  one  for  each  alternative  response,  liien  at 
each  instant  we  can  observe  the  drumstick  position  and  velocity, 
decide  which  model  applies,  and  then  make  our  response  based  on 
that  model,  'this  is  known  as  the  multiple  model  or  generaliiedlike- 
lihood  approach,  and  produces  a  generalized  maximum  likelihood 
estimate  of  the  current  and  future  values  of  the  state  variables  (10), 
Moreover,  the  cost  of  the  Kilman  filler  calculations  is  sufficiently 
small  to  make  the  approach  quite  practical. 
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Figure  4;  MusicWorld's  drum  kit. 


Intuitively,  this  solution  breaks  the  dnunmer's  overall  behavior 
down  into  several  “prototypical”  behaviors.  For  instance,  we  might 
have  dynamic  models  corresponding  to  a  relaxed  dtununer,  a  very 
“light”  drummer,  and  so  forth.  We  then  classify  the  drummer's 
behavior  by  determining  which  model  best  fits  the  drummer's  ob¬ 
served  behavior. 

Mathematically,  this  is  accomplished  by  setting  up  one  Kalman 
filter  for  the  dynamics  of  each  model: 

(25) 

where  the  superscript  (i)  denotes  the  i‘''  Kalman  filter.  I'he  mea- 
surement  innovations  process  for  the  i"*  model  (and  associated 
Kalman  filler)  is  then 


r<;>  =  Y*-h“»(X:“\()  (26) 


'I'he  measurement  innovations  process  is  zero-mean  with  covariance 

n. 

'I'he  i''*  measurement  utnovaiioiis  process  is,  intuitively,  the  part 
of  the  observation  data  that  is  unexplained  by  the  i''*  model.  'I'he 
model  that  explains  the  largest  portion  of  the  observations  is,  of 
course,  the  most  model  likely  to  be  correct.  Thus  at  each  time  step 
calculate  the  probability  of  the  ni -dimensional  observations 
Y*  given  the  i"'  model's  dynamics. 


P"'(Y*) 
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and  choose  the  model  with  the  largest  probability.  'Ihis  model  is  then 
used  to  estirnaUr  the  current  value  of  the  state  vanubles,  to  predict 
their  future  values,  and  to  choose  among  alternative  responses 
When  optimizing  predictions  of  measurements  At  in  the  future, 
equation  26  must  be  modified  slightly  to  test  the  predictive  accuracy 
of  state  estimates  from  At  in  the  past. 


It'  =  Yc  -  h<'»(Xt',  +  AI)AM))  (28) 


by  ubstituting  equation  1 7 


Figure  S:  Communications  used  for  conUol  and  filtering  of  Polhe 
mus  sensor. 
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Figure  6.  Conunumcatiuns  and  lead  tunes  for  MusicWorld  pro 
cesses. 
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5  MusicWorld 

Our  solution  is  demonstrated  in  a  musical  virtual  reality,  an  ap¬ 
plication  requiring  synchronization  of  user,  physical  simulation, 
rendering,  and  computer-generated  sound,  litis  system  is  called 
MusicWorld,  and  allows  users  to  play  a  virtual  set  of  drums,  bells, 
or  strings  with  two  drumsticks  controlled  by  Polhemus  sensors.  As 
the  user  moves  a  physical  drumstick  the  corresponding  rendered 
drumstick  tracks  accordingly.  The  instant  the  rendered  drumstick 
strikes  a  drum  surface  a  sound  generator  produces  the  appropriate 
sound  for  that  drum.  The  visual  appearanceof  MusicWorld  is  shown 
in  Figure  4,  and  a  higher  quality  rendition  is  included  in  the  color 
section  of  these  proceedings. 

Figure  S  shows  the  processes  and  communication  paths  used  to 
filter  and  query  each  Polhemus  tensor.  Since  we  cannot  insure  that 
the  application  control  process  will  query  the  Polhemus  devices  on 
a  regular  buit,  and  since  we  do  not  want  the  above  Kalman  loop  to 
enter  into  the  processing  pipeline,  we  spawn  two  small  processes  to 
constantly  query  and  filter  the  actual  device.  The  application  control 
process  then,  at  any  time,  hu  the  opportunity  to  make  a  fast  query  to 
the  filter  process  for  the  most  up  to  date,  filtered,  polhemus  position. 
Using  shared-memory  between  these  two  processes  makes  the  final 
queries  fully  optimal. 

MusicWorld  is  built  on  top  of  the  ThingWorld  system  (7;  8), 
which  has  one  process  to  handle  the  problems  of  real-time  physical 
simulation  and  contact  detection  and  a  second  process  to  handle 
rendering.  Sound  generation  is  handled  by  a  third  process  on  a  sep¬ 
arate  host,  running  CSound  [9],  Figure  6  shows  the  communication 
network  for  MusicWorld,  and  the  lead  times  employed. 

The  application  control  process  queries  the  Kalman  filter  process 
for  the  pi^icted  positions  of  each  drumstick  at  I/IS  and  1/30  of 
a  second.  IWo  different  predictions  are  used,  one  for  each  output 
device.  The  1/ IS  of  a  second  predictions  are  used  for  sound  and 
are  sent  to  ThingWorld  to  detect  stick  collisions  with  drums  and 
other  sound  generating  objects.  When  future  collisions  are  detected, 
sound  commands  destined  for  1  / 1 S  of  a  second  in  the  future  are  sent 
to  CSound.  Regardless  of  collisions  and  sounds,  the  scene  is  always 
rendered  using  the  positions  predicted  at  1/30  of  a  second  in  the 
future,  corresponding  to  the  fixed  lag  in  our  rendering  pipeline.  In 
general,  it  would  be  more  optimal  to  constantly  check  and  update 
the  lead  times  actually  needed  for  each  output  process,  to  insure 
that  dynamic  changes  in  network  speeds,  or  in  the  complexity  of  the 
scene  (rendering  speeds)  do  not  destroy  the  effects  of  synchrony. 

6  Summary 

The  unavoidable  processing  delays  in  computer  systems  mean  that 
synchronization  of  graphics  and  sound  with  user  motion  requires 
prediction  of  the  user's  future  position.  We  have  shown  how  to  con- 
suuct  the  optimal  linear  filter  for  estimating  future  user  position,  and 
demonstrated  that  it  gives  better  performance  than  the  conunonly 
used  technique  of  position  sm  'ting  plus  velocity  prediction.  'Ihe 
ability  to  pr^uce  accurate  premctions  can  be  used  to  minimize  un¬ 
expected  delays  by  using  them  in  a  system  of  multiple  asynchronous 
processes  with  known,  fixed  lead  times.  Finally,  we  have  shown  that 
the  eombiiialiun  of  optimal  filtering  and  careful  consuuction  of  sys¬ 
tem  conununicaliuuscsia  result  in  a  well-synchronized,  multi-modal 
virtual  environment 
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Abstract 

This  p(q;>er  introduces  a  predictor  based  visual  feedback  aid 
for  navigating  through  virtual  environments  using  velocity 
control  The  predictor  indicates  to  the  user  where  and  how 
fast  he  or  she  is  travelling  and  has  a  direct  manipulation  feel 
to  iL  Experiences  using  the  predictor  to  navigate  over 
digital  terrain  maps  are  discussed,  which  show  It  to  be  an 
ai^  in  learning  to  use  velocity  control  and  in  creating 
smooth  flight  paths  over  thinned  wire  frame  representation 
of  a  scene  for  subsequent  single  frame  animation. 
Measurements  of  performance  in  using  the  predictor  to  fly 
through  a  tube  scene  show  a  benefit  for  the  less  experienced 
users. 


Introduction 

For  the  past  six  years  our  work  has  focussed  on 
methods  for  exploring  Tishtank"  virtual  environments. 
These  are  not  the  fuU-blown  environments  with  head 
mounted  displays,  coupled  to  head  position  (Sutherland, 
1968;  Blanchard,  et  al,  1990),  but  r^er  the  (currently)  far 
more  useful  environments  w^re  the  virtual  3D  world  is 
perceived  to  be  behind  the  monitor  window.  Given  this 
common  configuration,  the  user  requires  a  means  to  move 
through  the  virtual  environment  and  manipulate  objects 
within  it  •  both  of  these  are  6  degree  of  freedom  (6DF) 
tasks.  Previous  woric  on  viewpoint  manipulation  in  our 
laboratory  using  the  Bat  input  device  has  established  that 
control  over  viewpoint  velocity  to  be  a  preferred  exploration 
mode  (Ware  and  Osborne,  19^).  The  Bat  (like  a  mouse 
that  flies  or  fledenmus)  senses  the  user's  band  position  and 
orientation.  We  use  the  button  as  a  kind  of  engagement 
device  and  while  the  button  is  held  down  relative  positiwi 
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and  orientation  is  converted  to  viewpoint  velocity; 
translational  position  is  converted  to  translational  velocity 
and  orientation  is  converted  to  rotational  velocity.  We  use 
a  quadratic  function  to  m^  hand  displacment  to  both 
translational  and  rotational  velocities  and  this  gives  control 
through  changes  of  scale  of  up  to  four  orders  of  magnitude. 


The  hand  position  is  computed  relative  to  the  6D 
coordinates  of  the  initial  change  to  the  button  down  state. 
Using  relative  position  in  this  way  has  advantages  and 
disadvantages.  It  allows  the  user  to  work  comfortably.  If 
the  user  finds  a  position  awkward,  letting  go  of  the  Bat 
buuon  instantly  stops  motion;  the  band  can  be  then  moved 
to  a  more  convenient  position,  usually  fairly  close  to  the 
body  without  undue  arm  extension,  and  motion  can  be 
resumed  relative  to  this  new  position.  The  disadvantage  of 
the  relative  mode  is  that  the  user  is  not  likely  to  remember 
the  starting  position  of  the  band  (button  down  transition). 

If  you  knew  where  your  hand  was  your  could  infer  your 
velocity.  As  it  is  there  are  only  visual  cues  available  from 
the  virtual  environment  about  the  current  viewpoint 
velocity  and  these  are  often  not  adequate,  especially  when 
the  environment  has  little  texture. 

The  present  project  was  initiated  to  develop  a 
viewpoint  navigation  aid  by  providing  the  user  with 
feedtock  on  bis  or  her  current  velocity.  The  most  important 
source  of  inspiration  tame  from  experimental  heads-iq> 
cockpit  displays  designed  to  illustrate  the  aircraft  attitude  in 
the  pilot’s  field  of  view.  In  some  experimental  studies  it 
has  been  found  useful  to  display  the  aircraft's  predicted 
attitude  in  addition  to  the  current  aircraft  attitude  (Gallagher 
et  al,  1977;  Kelley,  1968).  Taking  this  a  step  further  is  the 
"quickened"  display  which  only  shows  the  aircraft's  future 
position  (for  a  (Uscussion  see  Wickens  1984). 

The  notion  of  quickening  was  especmlly  atbactive 
to  us  since  we  felt  it  might  give  a  direct  object 
manipulation  feel  to  the  interface.  Even  though  the  user  is 
in  fact  directly  manipulating  the  cunent  velocities,  be  or 
she  may  feel  that  it  is  die  predictor  dial  is  being 
manipulated  and  die  predictor  shows  a  future  posidon  and 
orientadon  based  on  extrapolation.  Assunung  success,  the 
user  will  feel  in  control  over  die  predictor  and,  in  a  sense, 
control  over  die  future  view  point  widi  a  guaranteed 
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Figure  1.  The  predictor  which  is  perceived  at  time  Ti  is  based  on  the 
predicted  position  of  the  viewpoint  at  time  Ti-t-n.  The  streamers  from  the 
comers  of  the  predictor  trace  out  the  path  of  the  predictor  over  the  previous 
frames. 


Figure  2.  The  predictor  is  seen  in  use  over  .t  digital  terrain  map  rcprcsctiting  the  Nortli 
Atlantic 
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smooth  transition  from  the  current  viewpoint.  To  arrive  at  a 
particular  location  it  will  be  only  necessary  to  point  the 
predictor  at  it,  and  changes  of  orientation  may  be  achieved 
in  a  similar  fashion.  In  this  respect  the  display  would  be 
like  Mackinlay  et  al's  ( 1990)  technique  for  viewpoint 
navigation  relative  to  specified  points  on  the  surfaces  of 
objects,  only  without  the  necessity  of  tieing  navigation 
(Urectly  to  objects. 

Predictor  design 

The  aircraft  problem  and  the  ftshtank  interaction  proUem 
are  not  exactly  ismnoijAic.  An  aircraA  has  complex  flight 
dynamics  vidioreas  our  interface  was  designed  for  complete 
freedom  of  motion  with  ease  of  use  being  the  only 
consideration.  We  are  able  to  move  up  down 
fofwan)s,backwards  and  sideways  with  equal  facility. 
Because  a  conventional  predictor  will  not  be  visible  except 
in  the  case  of  forwind  motion  we  gave  our  predictor  a 
neutral  point  in  front  of  the  viewpoint,  as  illustrated  in 
Figure  1.  To  add  velocity  and  tri^ectoryfeedbadc  we  added 
taUs  to  the  four  comers  of  the  pi^ctor  frame.  Although 
these  tails  actually  look  like  ribbons,  they  behave  like 
smoke  trails.  Hiat  is,  they  marie  the  course  of  the  predict^’ 
frame  through  space.  Figure  2  illustrates  the  predictor 
being  used  to  create  a  motion  path  over  a  digital  terrain 
map, 


The  first  stages  of  predictor  design  were  an  iterative  process 
without  formal  evaluation.  However,  it  is  obvious  to  us 
already  that  it  is  a  valuable  navigation  aid  and  as  anticipated 
it  has  a  direct  manipulation  feel  to  )l  It  has  had  additional 
benefits  which  were  not  anticipated.  Because  our  system 
uses  the  standards  Z  buffering  for  hidden  elimination,  it 
gives  a  collision  cue.  The  predictor  can  be  seen  to  enter  an 
object,  leaving  it's  tail  still  visible  allowing  for  avoidance 
action. 


Uses 

Our  first  real  application  of  the  predictor  is  in  virtual 
camera  control.  We  are  involved  in  a  major  Canadian  ocean 
mapping  project  at  UNB  and  we  have  created  pait  of  an 
animated  videotape  fmr  the  Canada  pavilkm  in  the  upcoming 
World's  Fair  in  Seville  Spain.  We  used  the  predictor  with 
the  velocity  control  interface  to  create  a  motion  path  in  real¬ 
time  over  a  thinned  wire  frame  lepresentatioD  of  the 
topographic  data.  We  can  then  reused  the  Sit,cd  motion 
p^  with  single  frame  animation  and  high  quality  rendering 
techniques  to  create  the  required  movie. 

We  are  also  building  the  predictor  into  a  data 
visualization  and  editing  system  for  oceanographic 
research. 

Evaluation 

Our  experience  in  using  the  predictor  to  explore 
various  kinds  of  terrain  data  suggest  that  the  predictor  tails 
help  in  providing  feedback  about  velocity,  smoothness  and 
direction  of  travel  which  is  invaluable  in  the  specification 
of  a  motion  path  for  a  flyby  animation.  In  this  kind  of 
scene  the  terrain  consists  of  a  wire  mesh  which  means  that 
the  tails  were  always  visible  to  die  user.  In  addition,  the 
visual  feedback  from  the  predictor  tails  are  especially  useful 
in  graphically  unpovenshed  scenes,  they  make  up  for  tlie 
lack  of  visual  motion  parallax  infonuation  (Gibson,  et  al. 
1959) 


AvgTime 

Figure  3,  At  •  perllculer  time  letting,  the 
presence  of  the  predictor  allows  Inexperineced 
subjects  to  perform  better.  Experienced  subjects 
perform  worse.  See  text  for  explanation 


We  are  beginning  a  series  of  formal  studies  to  evaluate 
various  predictor  parameters  such  as  qHimal  extrapolation 
time  and  streamer  length.  The  results  we  have  thus  far 
come  from  a  task  in  which  subjects  navigate  through  a 
tunnel  which  is  made  up  of  a  sequence  of  eight  curves  each 
having  a  different  radius.  Each  time  the  subject  does  the 
task  a  different  randomly  connected  sequence  of  curves  is 
used.  The  subject's  task  is  to  navigate  the  tunnel  as  fast  as 
possible  without  flying  through  a  wall.  We  measure  both 
time  to  completion  and  errors  under  the  three  conditions: 

No  predictor 
Pre^ctor  without  tails 
Predictor  with  tails. 

The  most  interesting  results  obtained  to  date  are  plotted  in 
Figure  T  which  shows  data  from  eight  subjects.  The 
relative  time  to  completion  for  the  predictor  without  tails 
condiuon  is  plotted  agains  average  time  to  completion.  The 
negative  correlation  shows  that  subjects  who  did  the  task 
slowly  (on  average)  were  did  significantly  better  with  the 
pradictor  -  tliey  are  representCHl  by  the  five  points  below  tlie 
liiie,  while  subjects  who  did  the  task  fast  wera  actually 
hindered  in  their  perfoniiance  of  die  task.  The  subjects  who 
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did  the  task  slowly  were  ones  with  no  prior  experience  with 
our  velocity  navigation  system  and  they  clearly  benefited 
from  the  presence  of  the  predictor.  The  reason  for  the 
degradation  of  perfnmance  whh  the  m(»e  experienced 
su^ects  became  clear  on  det^ed  analysis.  The  speed  with 
which  they  navigated  through  the  tube  was  such  that  the 
predictor  was  projected  right  out  of  sight  beyond  the  next 
bend,  most  of  the  time.  Because  of  this  the  subject  only 
occasionaDy  obtained  glimpses  of  the  predictor  which 
proved  to  be  a  distraction  rather  than  a  help.  It  appears 
likely  that  for  experienced  subjects  the  predictor  should  be 
prelected  a  shorter  time  into  the  future. 

The  data  obtained  we  have  obtained  thus  far  with 
the  tails  give  a  confusing  picture  which  suggest  that  some 
subjects  benefit  while  other  subject  find  them  to  be  a 
hindrance,  irrespective  of  experience.  We  are  continuing 
our  investigation. 

What  has  been  achieved 

We  feel  that  the  combination  of  Bat.  fishtank  environments 
and  predictor  has  immediate  utility  for  Scientific 
visualization  and  Cad  systems.  It  lacks  many  of  the 
motion  constraints  of  full  blown,  head  mounted  virtual 
reality  while  it  allows  for  almost  as  much  functionality, 
although,  of  course  the  feeling  of  immersion  in  die 
graphical  environment  is  absent  •  but  this  saves  on  Gravol. 
There  are  now  three  Bat  devices  in  or  close  to  poduction: 
the  SimGraphics  Flying  Mouse™,  the  Ascension 
Technologies  Bird™,  the  Logitech™  3D  mouse,  and  the 
Gyration  GyroPoint™.  In  other  studies  we  have  found  that 
Bats  are  go^  for  object  manipulation  (Ware  and  Jessome. 
1988,  Ware,  1990)  and  superior  to  the  SpaceBall™  for  3D 
navigation  (Ware  and  Slipp,  1991) 
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Videotape 

The  videotape  that  accompanied  tliis  paper  sliowed 
sequences  are  shown: 

1)  The  predictor  is  seen  in  use  in  the  Duct  Maze 
environment  used  to  evaluate  performance.  The  manouvers 
being  carried  out  show  how  die  predictor  behaves  when  it  is 
flown  in  and  out  of  walls. 

2)  'Ihe  predictor  is  used  in  an  interface  which  allows  the 
exploration  of  a  digitai  terrain  map  of  tlie  North  Atlantic 
and  the  west  coast  of  Nortli  America.  When  motion  stops 
the  surface  is  rendered  at  successive  levels  of  detail.  Tlie 
colour  coding  of  tlie  surface  illusuates  gravity  anomalies. 

In  tlie  version  illustrated  in  the  videotape,  the  predictor  tails 
extend  from  20  frames  in  tlie  future  to  10  frames  into  the 
past.  At  a  frame  rate  ol  20  frames/second  tliis  yields  a  one 
second  predictor  which  seems  about  right 
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Abstract 

This  paper  presents  a  general  system  for  camera  movement 
upon  which  a  wide  variety  of  higher-level  methods  and  applica 
tiona  can  be  built.  In  addition  to  the  basic  commands  for  camera 
placement,  a  key  attribute  of  the  CINEMA  system  is  the  ability  to 
inquire  information  directly  about  the  3D  world  through  which  the 
camera  is  moving.  With  this  information  high-level  procedures  can 
be  written  that  closely  conespond  to  more  natu'al  camera  specifi- 
cationa.  Examples  of  some  high-level  procedurt  <  are  presented.  In 
addition,  methods  for  overcoming  deficiencies  ot  this  procedural 
approach  are  proposed. 


1.  introduction 

Camera  control  is  an  integral  part  of  any  30  interface.  In 
rtccp'  years  a  number  of  techniques  for  interactively  specifvin.' 
camc.a  movement  haw  been  implemented  or  proposed.  E»  . 
these  techniques  has  provided  an  .iie'face  for  solving  a  problem 
for  a  particular  domain,  but  all  of  them  have  remained  independent 
making  it  impossible  to  them  across  domains.  These  domains 
include  keyframe  based  computer  graphic  animation  techniques 
(8, 11],  navigation  of  virtual  enviroiunents(l, 2, 9, 13. 13],  general 
3D  interaction  (3,  12],  automatic  presentation  [6]  (in  which  com¬ 
puters  generate  a  presentation),  and  synthetic  visual  narratives  (4] 
(in  which  users  author  presentations).  The  CINEMA  system 
described  in  this  paper  is  a  camera  protocol  that  supports  camera 
interface  paradigms  useful  for  all  these  domains,  and  provides  a 
framework  on  which  new  interfaces  can  be  developed. 

The  CINEMA  system  has  a  procedural  interface  for  specify¬ 
ing  camera  movements  relative  to  objects,  events,  and  the  geiierai 
stale  of  an  environment.  This  task  level  approach  enables  the 
implementation  of  many  common  mteraclive  metaphors  and  pro¬ 
vides  the  abil’ty  to  build  higher  level  parameterized  procedures 
that  are  reusab,  \ 

After  a  br.  'f  introduction  to  the  problem,  we  will  review 
related  work  in  Cv.  ■  '■ontrol,  and  then  describe  tlie  CINEMA 
system,  including  the  underlying  support  structure,  the  inipleinen- 
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taiion,  and  several  examples  that  dkrmonstrale  the  system.  Fmally, 
we  will  discuss  problems  with  this  approach  and  suggest  alterna¬ 
tives  based  upon  our  findings.  We  will  work  under  the  assumption 
that  the  actions  in  the  environment  are  occurring  independently 
from  the  observer.  By  makmg  Ibns  a.ssumpiion.  the  specification  of 
the  camera  is  mdependent  from  the  3D  world,  or  can  be  treated  as 
a  wmdow  into  the  world  that  docs  not  impact  on  it.  This  simplifica¬ 
tion  IS  made  by  many  of  lhe«xi.«iing  camera  interfaces  reviewed  in 
this  paper,  and  although  luniting,  it  is  appropriate  for  a  variety  of 
situations. 

An  effective  camera  protocol  must  support  interfaces  that 
investigate/explore  ani^  interfaces  tliat  present/illustrate  the  3D 
world.  Although  we  haive  only  begun  to  explore  the  uses  of  this 
system,  tliere  are  many  aptdieations  in  which  it  could  be  used.  In 
both  scientific  and  arcfutectural  visualization  there  is  the  need  to 
explore  the  virtual  envuomnent  interactively  and  then  to  later 
author  a  set  of  illustrative  camera  movements  to  be  shown  to  cli- 
eiiu  or  colleagues  In  electronic  books  there  will  be  the  need  for  a 
desigiwr  or  knowledge  based  system  to  generate  an  interface 
tluough  which  a  reader  can  view  the  information  In  the  entertain- 
ineiit  industry  an  animator  could  use  it  to  direct  or  specity  camera 
movenieiUs.  Live  action  film  makers  may  use  it  to  create  interac¬ 
tive  story  boards  of  their  ^cenes.  plan  camera  movements,  or  even 
to  generate  commands  for  motion  confiolled  cameras.  Telerobotic 
or  virtual  environment  apfilicalions  require  a  tusk  level  camera 
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protocol  in  order  to  allow  a  human  operator  to  efficiently  and  intu¬ 
itively  control  the  view  while  performing  or  directing  some  remote 
operation.  Ail  of  these  interfaces  can  be  supported  on  top  of  the 
camera  protocol  described  in  this  paper. 

2.  Previous  Work 

Early  work  in  animation  is  devoted  to  making  the  movement 
of  the  camera  continuous  and  to  developing  the  proper  representa¬ 
tion  for  camera  movements  along  a  path  [8,  11].  These  works  are 
devoted  to  giving  the  animator  greater  control  in  creating  smooth 
movements  and  to  finding  ways  to  interpolate  between  user  speci¬ 
fied  keyframes.  Although  generating  spline  curves  for  camer-i 
movement  can  produce  smooth  paths,  it  can  be  difficult  to  rela'.e 
the  movements  of  the  camera  to  objects  in  the  environment. 

With  the  advent  of  virtual  environments  and  related  .^D  in'.er- 
active  worlds,  a  great  deal  oi  effort  has  been  spent  on  preseniing 
convenient  metaphors  through  which  to  change  the  user's  view  of 
an  object  or  the  world.  A  metaphor,  as  discus-  'd  in  Ware  et  ( 12] 
provides  the  user  with  a  model  that  enables  the  prediction  c  i'  sys¬ 
tem  behavior  given  different  kinds  of  input  actions.  A  good  mc,,.- 
phor  is  both  appropriate  and  easy  to  learn.  Some  examples  of 
metaphors  are  the  'eyeball  in  hand'  metaphor,  the  'scene  in  hand' 
or  'dollhouse'  metaphor,  and  'flying  vehicle  control.' 

In  the  ‘eyeball  in  hand'  metaphor,  a  6  degree  of  freedom 
device  IS  tiled  to  position  and  orient  a  camera  by  directly  translat¬ 
ing  loMlli/ig  the  input  device.  Ware  et  al  found  this  method 
somewhait  awkw^trd  to  use  but  easy  to  team.  The  'scene  in  hand' 
metaphor  allows  r'  «  user  to  rotate  and  translate  the  scene  based  on 
the  positiuM)  of  the  input  device.  This  was  found  to  be  very  conve¬ 
nient  for  hand  fiir-ed  objects,  but  nearly  impossible  to  use  for  navi¬ 
gating  inside  closed  s^paces.  Another  scheme  discussed  by  Ware  et 
al  was  to  control  a  simulated  flying  vehicle.  The  user's  position 
and  orientation  respeettvely  affected  the  linear  and  angular  veloc¬ 
ity  of  the  camera  viewpoint  and  direction  of  gaze.  This  metaphor 
makes  it  easy  to  navigate,  but  difficult  to  examine  a  particular 
object.  Although  3D  input  devices  such  as  a  Polheinus  Isotrack 
system  or  a  Spatial  Systems  Spaceball  enable  the  user  to  specify  6 
degrees  of  freinlmv;)  simultaneously,  simulations  of  these  devices 
can  be  done  using  only  2D  devices  [3]. 

Mackinlaiy  et  al  (9]  discuss  die  problem  of  scaling  camera 
movements  a()propnately.  They  develop  methods  to  select  an 
object  of  interest  and  to  move  exponentially  towards  or  away  from 
d'  object.  In  this  way,  when  the  user  is  close  to  an  object,  the 
vKcwpomt  changes  only  a  little,  while  when  they  are  far  from  an 
object,  the  viewpomt  changes  rapidly.  By  selecting  'point  of  mier- 
est,’  the  authors  can  reorient  the  camera  to  present  a  maximal  view 
of  the  desired  object.  The  degrees  of  freedom  are  therefore 
restricted  and  the  user  can  concentrate  more  on  llie  task  of  navigat 
ing  through  the  environment. 

Brooks  [1,  2]  developed  several  different  meUiods  for  moving 
around  architectural  smiulatioiis  including  steerable  tieadiiiills  or 
shoppmg  carts  widi  devices  to  measure  the  duectiun  and  speeu  of 
movement. 

The  above  work  shows  that  different  interfaces  are  appropri 
ale  for  different  application  rc>quuemeiits.  In  our  view,  no  one 
interface  is  ideally  suited  for  all  tasks,  and  a  common  underlying 
structure  on  top  of  which  several  different  metaphors  can  t>e  miple- 
tneiued  would  give  the  user  a  powerful  tool  to  inieraci  wnli  3D 
envu-onments. 


An  important  ability  ts  to  allow  the  user  to  select  an  object  of 
interest  within  the  environment.  We  have  expanded  on  this  by 
allowing  the  user  to  make  general  queries  about  the  visibility  and 
orientation  of  objects  within  the  environment.  This  allows  the  user 
to  manipulate  camera  motion  based  on  the  actions  within  the  envi¬ 
ronment. 

Furthermore,  while  direct  manipulation  has  certain  advan¬ 
tages  in  interactive  systems,  there  are  several  deficiencies.  It  is  not 
necessarily  good  for  repetitive  actions,  and  any  action  that  requires 
a  great  deal  of  accuracy,  such  as  smooth  movement  for  cameras,  is 
not  necessarily  suited  to  input  using  one  of  the  metaphors  sug¬ 
gested  in  the  preceding  paragraphs.  Some  of  the  problems  inherent 
in  using  6  DOF  input  devices  presently  available  are  noise  which 
is  inherent  in  user  movements  and  the  number  of  degrees  of  free¬ 
dom  which  must  'oc  simultaneously  controlled.  Textual  systems, 
with  interaction  built  on  top  of  them,  allow  both  a  high  level  input 
device  interface,  and  an  underlying  language  through  which  com¬ 
mands  can  be  specified  directly  or  generated  through  other  rule 
bases. 

An  expert  system  for  presentation,  including  the  selection  of 
proper  camera  movements,  is  discussed  m  some  detail  by  Karp  and 
Feincr  (6],  In  their  Esplanade  .system  (Expert  System  for  PLAN- 
ning  Animation  Design  and  Editing),  they  emphasize  the  ability  to 
incorporate  cinematic  knowledge  for  the  construction  of  coherent 
descriptions  of  a  scene.  To  do  so,  they  need  to  have  representations 
of  a  database  of  objects  and  explicit  events,  along  with  a  notion  of 
how  frames,  shots,  scenes  and  sequences  can  be  put  together  to 
midee  an  effective  narrative.  Then  work  emphasizes  using  a 
knowledge  based  system  for  automatically  selecting  camera  place¬ 
ment  and  for  choosing  appropriate  camera  movements  based  on 
cinematic  considerations.  Currently,  they  do  not  concentrate  on  the 
movements  themselves,  but  more  on  the  initial  placement  of  the 
camera  for  shots  and  how  to  make  transitions  to  other  shots. 

3.  The  CINEMA  System 

We  have  developed  the  CINEMA  system  to  address  the  prob¬ 
lem  of  combining  different  paradigms  for  controlling  camera 
movements  into  one  system.  The  CINEMA  system  is  extensible, 
permitting  the  user  to  build  higher  level  procedures  from  simpler 
primitives.  It  also  provides  the  very  imjxirtant  ability  to  make 
inquiries  into  a  database  which  contains  information  describing  the 
slate  of  objects  wiihni  a  3D  envuunment.  After  liie  system  was 
developed,  it  was  used  by  a  dozen  students  m  a  course  entitled 
"Synthetic  Cinematic  and  Cinematic  Knowledge.'”  In  this  course 
the  siudenls  used  this  system  to  explore  alieriialive  ways  of  ani¬ 
mating  one  of  several  scenes.  Although  this  system  has  mainly 
been  used  fur  a  synthetic  narrative  application,  we  feel  that  what 
was  learned  is  applicable  to  the  other  domains  sucli  as  the  applica- 
iioiis  meiiliuiic-d  above. 

Tlie  CINEMA  system  is  divided  up  into  two  major  parts.  The 
first  IS  a  database  which  comams  information  about  objects,  their 
(xisitions  over  lime,  and  evenls  over  time  Tlie  scvoiid  part  is  a 
{laiscr  that  accepts  and  mteqirels  user  co.  .mauds.  The  user  com¬ 
mands  are  restricted  to  inquiries  about  tlie  state  of  tlie  database  and 
commands  which  query  or  affect  the  state  of  the  camera. 


1  Iht  voun»<;  ken  jl  the  Mil  Media  l^b  Prulev»or>  Da^td 

/ciUcf  and  Giuiunna  l>a^cnport  iwu  t>hon  tn  January  and 

and  two  full  'semester  c.ourses  tn  the  Spring  ol  and  IMd} 
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3.1  Support  structure  for  CINEMA 

To  jwoduce  cinema’s  procedural  interface  it  was  necessary 
to  develop  a  set  of  primitive  functions.  There  are  three  parts  to  this 
support  structure.  First,  there  is  a  set  of  commands  for  moving  the 
camera  of  inquiring  about  the  current  camera  state.  Second  is  a  set 
of  commands  for  inquiring  about  the  state  of  the  3D  world.  Last  is 
a  set  of  mathematical  routines  for  manipulating  the  values  returned 
from  the  other  functions. 

There  are  two  sets  of  primitive  functions  for  changing  the 
camera  position  and  orientation.  The  lower  level  of  these  are  the 
commands  that  directly  set  the  x,  y,  and  z  positions  and  the  from, 
up,  and  at  vectors  that  are  so  commonly  used  in  computer  graph¬ 
ics.  The  slightly  higher  level  primitives  (but  still  part  of  the  support 
structure)  perform  simple  camera  moves  like  pan,  lilt,  roll,  truck, 
dolly,  and  crane  [7].  In  the  film  industry  terms  such  as  dolly  and 
truck  arc  loosely  used.  For  example,  truck  may  be  used  to  mean  a 
move  in  or  out,  or  a  move  from  side  to  side.  In  this  implementation 
we  have  chosen  one  of  the  possible  definitions  for  these  terms  to 
avoid  confusion.  The  conversion  between  the  computer  graphics 
vectors  and  the  film  standards  is  straightforward. 

Many  descriptions  of  how  to  film,  frame,  and  navigate  the 
scene  (by  both  screenwriter  and  layperson)  are  with  respect  to  the 
objects  in  the  world.  For  example  one  might  ask  for  the  camera  to 
move  alongside  object  A  while  looking  at  object  B.  An  interface 
that  supports  these  descriptions  must  provide  infonnation  about 
events,  geometric  and  spatial  relationships  such  as  position,  rela¬ 
tive  occlusion,  direction  of  glance,  and  distances.  For  example, 
functions  like  Obj^visibillcyii,  obj^obj^visibiiuyo  find 
visibility  information  between  the  camera  and  an  object  or 
between  two  particular  objects.  Currently  this  is  implemented  by 
using  simple  ray  casting  with  bounding  box  intersection.  More 
sophisticated  techniques  can  be  used  to  provide  a  more  precise 
notion  of  visibility.  However,  this  implementation  has  proven  ade¬ 
quate  for  this  preliminary  research.  Other  functions  (like 
/Mffit>,gyencsfj)  are  provided  to  support  inquiry  into  discrete 
events  which  might  take  place  during  an  animation. 

In  addition  to  the  commands  described  above,  the  system  pro¬ 
vides  a  set  of  supporting  mathematical  commands,  including  both 
scalar  and  vector  calculations.  Tlicsc  comman<ts  arc  needed  to 
manipulate  the  output  of  the  inquiry  commands.  With  these  func¬ 
tions,  an  inquiry  about  the  state  of  the  scene  can  be  manipulated  to 
calculate  newt  camera  p.aramclers  (such  as  position,  from,  at  and  up 
vectors).  Willi  combinations  of  these  basic  tools  higher  level  pro¬ 
cedures  can  be  built. 

3.2  Implementation 

Tlie  entire  system  is  tunenlly  implemented  on  2  platfonns. 
an  HP9000-835  turbo  SRX  m  C  using  a  public  domain  front  end 
language  called  Tcl  [10],  and  on  an  Apple  Macintosh.  Tlie  Macin 
tosh  platform  ctm  not  provide  mteradive  update  rates  fur  rendered 
images,  but  is  'successfully  used  for  wireframe  images. 

3.3  Examples 

The  following  examples  ate  lepresenlatioe  of  how  the  CIN 
EMA  System  was  used  i/ti  several  different  sim.alions.  Tlie  fii»i 
example  shows  how  line  ClNEiMA  system  is  interfaced  to  a  3D 
environment,  and  imploments  one  of  Ware  et  al's  moseanerit  iiietu 
phors.  Tlie  second  example  shows  how  higher  level  caineia  move 


rnents  can  be  built  from  lower  level  primitives  and  inquiry 
functions.  Finally,  example  3  shows  the  cinematic  power  of  the 
system  in  filming  a  simple  animation. 

Example  1:  CAMERA  MOVEMENT  METAPHOR:  This 
example  shows  how  a  3D  input  device  such  as  the  Isotrack  Polhe- 
mus  or  Spatial  Systems  Spaceball  can  be  used  to  change  the  view 
in  a  scene.  In  the  accompanying  video,  we  use  an  Ascension  Tech¬ 
nologies  Bird  to  control  the  x,  y,  and  z  position  of  the  camera  while 
always  looking  at  the  object  called  “joe."  This  is  very  similar  to  the 
“eyeball  in  hand"  movement  metaphor  discussed  by  Ware  et  al. 

The  following  pseudocode  shows  how  this  function  is  imple¬ 
mented  using  the  CINEMA  system.  Tire  function  consists  of  an 
inquiry  to  the  6  DOF  input  device  and  then  translating  the  camera 
based  upon  the  trtinslation  returned  by  the  input  device. 

proc  6yeball_ia_hand (object)  { 

(x,y,2)  :=  get_input_£ronv_device ( )  ; 
cam_»Bt_point(x,y, z) ; 
loojtat  (object)  ; 

) 

Example  2:  EXTENSIBLE  LANGUAGE:  The  procedure 
"vertigo  shot"  simulates  Hitchcock’s  classic  shot  in  the  film  "Ver¬ 
tigo"  where  the  camera  moves  outwards  while  the  field  of  view 
grows  narrower  keeping  the  object  a  constant  size  at  the  center  of 
the  frame.  This  effect  makes  viewers  feel  as  if  they  are  moving 
closer  and  closer  to  an  unattainable  goal.  In  only  a  few  minutes  we 
constructed  the  following  procedure  to  make  a  vertigo  shot. 

proc  vottigo_8hot(obj,  rate,  no_frames)  { 

/•  get  Che  angle  subtended  by  Che  object  •/ 
angle  :=  g*t_angle_b»lgbt(obj) ; 

/*  get  Che  camera's  field  of  view  V 
f ov  1 =  c*m_£ov ( ) j 
/  •  coirpuce  the  percentage  of  the  fov  •/ 

/  •  which  Che  object  subtends  •/ 

percent  !=  angle/ fov; 

for  (1=0; i<no_frames; i+4 )  { 

/•  truck  2n  the  specified  direction  '/ 

/'  at  Che  specified  rate  '/ 
caiti_truc)t  (rate)  ; 

/•  sec  Che  field  of  view  so  chat  '/ 

/'  the  object  subtends  Che  same  •/ 

/'  percentage  as  before  '/ 
f rame_it (obj ,  percent ) ; 

) 

1 

Example  3:  SYNTHETIC  NARRATIVE:  Tlie  last  example 
shows  that  the  system  can  be  used  for  simple  cinematic  teaching 
purposes.  An  animation  of  a  figure  sitting  down  is  filmed.  A  cut  in 
the  middle  of  the  animation  changes  the  viewpoint  fiom  an  oblique 
view  to  a  head  on  view.  Tlie  views  are  selected  so  that  a  “match" 
vul  (5)  IS  achieved.  See  sequence  of  frames  al  the  end  of  the  paper, 

4.  Future  Work 

Tlie  CINEMA  system  needs  to  be  extended  to  provide  a 
mechanism  to  easily  combine  and  constrain  multiple  procedures. 
For  example,  suppose  a  user  would  like  to  track  the  motion  of  a 
walking  figure  while  preventing  the  camera  from  moving  through 
walls.  Ideally,  one  Would  like  to  have  these  procedures  (one  for 
tracking  and  one  for  avoidance)  automatically  cximbined  to 
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achieve  the  desired  performance.  Currently,  it  would  be  necessary 
to  construct  a  hew  procedure  meeting  both  constraints. 

The  ability  to  combine  procedures  would  allow  user  input  to 
be  treated  as  another  procedure  that  can  be  combined  with  other 
constraints.  Camera  movements  could  then  be  interactively 
adjusted  to  achieve  a  desired  result. 

■To  address  some  >  f  these  problems,  we  have  already  begun 
exploration  into  cons.idint  satisfaction  techniques  for  camera 
placement  and  movement.  By  specifying  the  camera’s  relationship 
to  other  objects  via  weighted  constraints,  the  system  can  find  the 
best  position  that  satisfies  certain  criterion.  These  constraints  are 
maintained  as  the  objects,  and  the  camera  moves  throughout  the 
environment.  Additional  constraints  can  be  placed  on  the  move¬ 
ment  of  the  camera,  so  that  the  camera  can  have  attributes  of  a  sim¬ 
ulated  physical  object  such  as  a  fluid  head. 

5.  Conclusion 

The  CINEMA  system  provides  users  with  the  ability  to  rap¬ 
idly  experiment  with  various  camera  movement  paradigms.  Users 
can  create  new  camera  metaphors  or  extend  existing  ones.  The 
ability  to  inquire  about  the  state  of  objects  in  the  environment  pro¬ 
vides  support  for  more  powerful  camera  movetnent  procedures. 

The  CINEMA  system  has  already  proven  quite  useful  tn  the 
teaching  domain.  Students  were  able  to  use  the  CINEMA  syttiem 
to  explore  different  svays  to  film  atid  present  a  simple  animation, 
and  to  plan  a  real  camera  shoot.  The  constraint  satisfaction  meth¬ 
odology  described  above  is  an  ongoing  area  of  research.  There  are 
many  other  areas  to  explore  in  camera  movement  systetn,s  includ¬ 
ing  rule  bused  generations  systetns,  codifying  stylistic  attributes, 
examining  cuts,  and  interfacing  with  task  oriented  applications  to 
name  just  a  few.  The  hope  is  that  once  a  strong  support  base  for 
camera  positioning  and  movement  is  produced,  further  research  in 
these  areas  wilt  be  easier. 

The  CINEMA  system  makes  it  possible  to  experiment  with 
camera  paradigms  quickly  and  convcnictitly.  We  intend  to  continue 
evolving  the  CINEMA  system  with  an  eye  toward  different  appli¬ 
cation  domains  including  tclcrobotics/virtual  environments  and 
synthetic  narratives. 
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Abstract 

This  paper  describes  a  technique  for  augmenting  tlie 
process  of  3D  direct  manipulation  by  automatically  find¬ 
ing  an  effective  placement  for  the  virtual  camera.  Many 
of  tlie'best  techniques  for  direct  manipulation  of  3D  ge¬ 
ometric  objects  are  sensitive  to  the  angle  of  view,  and 
can  thus  require  that  the  user  coordinate  the  placement 
of  the  viewpoint  during  the  manipulation  process.  In 
come  cases,  this  process  can  be  automated.  This  means 
that  the  system  can  automatically  avoid  degenerate  sit¬ 
uations  in  which  translations  and  rotations  are  difficult 
to  perform.  The  system  can  also  select  viewpoints  and 
viewing  angles  which  make  the  object  being  manipu¬ 
lated  visible,  ensuring  that  it  is  not  obstructed  by  other 
objects. 

Introduction 

3D  direct  manipulation  is  a  technique  for  controlling 
positions  and  orientations  of  geometric  objects  in  a  3D 
environment  in  a  non-numerical,  visual  way.  Although 
much  research  has  been  devoted  to  3D  direct  manipu¬ 
lation  of  geometric  objects,  no  existing  system  has  ade¬ 
quately  Integrated  the  controls  for  viewing  into  the  di¬ 
rect  manipulation  process.  Evans,  Tanner,  and  Wein 
[3],  Nielson  and  01son[6],  and  Chen  et  al  [1]  all  discuss 
techniques  for  manipulation  that  are  sensitive  to  the 
viewing  direction,  but  they  do  not  address  how  the  view 
can  be  manipulated.  Ware  and  Osborne[10]  discuss  the 
viewing  process  in  general,  in  terms  of  metaphors  that 
it  suggests,  and  Mackinlay  et  al  [5]  discuss  an  effec- 

Permission  to  copy  without  fee  all  or  part  of  this  material  is 
granted  provided  that  the  copies  are  not  made  or  distributed  for 
direct  commercial  advantage,  the  ACM  copyright  notice  and  the 
title  of  the  publication  and  its  date  appear,  and  notice  is  given 
that  copying  is  by  permission  of  the  Association  for  Computrng 
Machinery.  To  copy  otherwise,  or  to  republish,  requires  a  fee 
and/or  specific  permission. 

®  1992  ACM  0-8979 1-47 1-6/92/0003/0071...$  1.50 


live  technique  for  manipulating  the  viewpoint,  both  in 
proximity  to  other  objects  and  through  large  distances. 
Neither  of  these  relate  the  viewing  process  to  direct  ma¬ 
nipulation. 

Our  direct  manipulation  system  includes  a  mecha¬ 
nism  for  automatically  placing  the  virtual  camera  at 
a  viewpoint  which  avoids  the  problems  with  degenerate 
axes  suffered  by  most  direct  manipulation  schemes.  The 
basic  idea  is  to  rotate  the  camera  through  small  angles 
to  achieve  a  better  view.  Our  system  also  rotates  the 
camera  to  avoid  viewing  obstructions.  This  viewing  op¬ 
eration  is  an  integral  part  of  the  manipulation  system, 
not  a  separate  viewing  facility  which  the  user  must  ex¬ 
plicitly  invoke. 

The  problem  of  automatic  viewing  placement  for  ma¬ 
nipulation  is  different  from  that  of  automatic  camera 
control  in  animation.  Karp  and  Feiner[4]  describe  a  sys¬ 
tem  called  ESPLANADE  that  automatically  visualizes 
simulations.  It  automatically  finds  camera  placements 
which  provide  a  good  view  of  movement  during  an  ani¬ 
mation.  This  is  an  adjunct  to  the  process  of  animation, 
not  an  interactive  technique. 


3D  Direct  Manipulation 

Several  techniques  have  been  developed  for  describing 
three  dimensional  transformations  with  a  two  dimen¬ 
sional  input  device  such  as  a  mouse  or  tablet.  Niel¬ 
son  and  Olson  [6]  describe  a  technique  for  mapping  the 
motion  of  a  two  dimensional  mouse  cursor  to  three  di¬ 
mensional  translations  based  on  the  orientation  of  the 
projection  of  a  world  space  coordinate  triad  onto  the 
screen.  This  mapping  makes  it  difficult  to  translate 
along  an  axis  parallel  to  the  line  of  sight,  because  the 

*Cary  Phillips’  current  address:  Pacific  Data  Images,  1111 
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axis  projects  onto  a  point  on  the  screen  instead  of  a 
direction. 

Rotations  are  considerably  more  complex,  but  several 
techniques  have  been  developed,  with  varying  degrees 
of  success.  The  most  naive  technique  is  to  simply  use 
horizontal  and  vertical  mouse  movements  to  control  the 
world  space  euler  angles  which  define  the  orientation 
of  an  object.  This  technique  provides  little  kinesthetic 
feedback  because  there  is  no  natural  correspondence  be¬ 
tween  the  movements  of  the  mouse  and  the  rotation  of 
the  object.  A  better  approach,  described  by  Chen  et 
al  [1],  is  to  make  the  rotation  angles  either  parallel  or 
perpendicular  to  the  viewing  direction.  This  makes  the 
object  rotate  relative  to  the  graphics  window,  providing 
much  greater  kinesthetic  feedback,  but  it  also  makes  the 
available  rotation  axes  highly  dependent  on  the  viewing 
direction. 

3D  Manipulation  in  Jack 

Our  interactive  system  is  called  ,  and  it  is  de¬ 

signed  for  modeling,  manipulating,  animating,  and  an¬ 
alyzing  human  figures,  principally  for  human  factors 
analysis.  The  3D  direct  manipulation  facility  in  Jack  al¬ 
lows  the  user  to  interactively  manipulate  figure  positions 
and  orientations,  and  joint  angles  subject  to  limits[7). 
Jack  also  has  a  sophisticated  system  of  manipulating 
postures  through  inverse  kinematics  and  behavior  func¬ 
tions  (8,  9].  Jack  runs  on  Silicon  Graphics  IRIS  work¬ 
stations,  and  it  uses  a  three  button  mouse  to  control 
translation  and  rotation.  Within  the  direct  manipula¬ 
tion  process,  the  user  can  toggle  between  rotation  and 
translation,  and  between  the  local  and  global  coordinate 
axes,  by  holding  down  the  CONTROL  and  SHIFT  keys,  re¬ 
spectively. 

With  translation,  the  user  controls  the  movement  by 
moving  the  mouse  cursor  along  the  line  which  the  se¬ 
lected  axis  makes  on  the  screen.  This  is  similar  to  the 
projected  triad  scheme  of  Nielson  and  01son(6],  and  it 
ensures  good  kinesthetic  correspondence.  Pairs  of  but¬ 
tons  select  pairs  of  a.\es  and  translate  in  a  plane.  A  3D 
graphical  translation  icon  located  at  the  origin  of  the 
object  being  nmnipulated  illustrates  the  selected  axes 
and  the  enabled  directions  of  motion. 

The  user  can  control  rotation  around  the  x,  y,  and 
z  axes,  in  either  local  or  global  coordinates.  Only  one 
axis  can  be  selected  at  a  time.  A  graphical  wheel  icon 
illustrates  the  o.’gin  and  direction  of  the  axis.  The  user 
controls  the  rotation  by  moving  the  cursor  around  the 
perimeter  of  the  rotation  wheel,  causing  the  object  to 
rotate  around  the  axis.  This  is  analogous  to  turning 
a  crank  by  grabbing  the  perimeter  and  dragging  it  in 
circles.  This  is  somewhat  similar  to  Evans,  Tanner  and 

^Jack  IS  a  traJeiaaik  uf  ttis  Uiuicisit>  uf  Puiiis^lNaiiia. 


Wein’s  turntable  technique[3],  but  it  provides  greater 
graphical  feedback. 

Drawbacks 

A  drawback  of  the  manipulation  technique  in  Jack  is  the 
inability  to  translate  an  object  along  an  axis  parallel  to 
the  line  of  sight,  or  to  rotate  around  an  axis  perpendic¬ 
ular  to  the  line  of  sight.  In  these  cases,  small  differences 
in  the  screen  coordinates  of  the  mouse  correspond  to 
large  distances  in  world  coordinates,  which  means  that 
the  object  may  spin  suddenly  or  zoom  off  to  infinity. 
This  is  an  intrinsic  problem  with  viewing  through  a  2D 
projection;  kinesthetic  correspondence  dictates  that  the 
object’s  image  moves  in  coordination  with  the  input  de¬ 
vice,  but  if  the  object’s  movement  is  parallel  to  the  line 
of  projection,  the  image  doesn’t  actually  move,  it  only 
shrinks  or  expands  in  perspective. 

In  the  past,  we  adopted  the  view  that  the  first  prereq¬ 
uisite  for  manipulating  a  figure  is  to  position  the  camera 
in  a  convenient  view.  Although  the  viewpoint  manip¬ 
ulation  techniques  in  Jack  are  quite  easy  to  use,  this 
forced  the  user  through  additional  step  in  the  manipu¬ 
lation  process,  and  the  user  frequently  moved  back  and 
forth  between  manipulating  the  object  and  camera. 

3D  Viewing 

The  computer  graphics  workstation  provides  a  view  into 
a  virtual  3D  world.  It  is  natural  to  think  of  a  graphics 
window  as  the  lens  of  a  camera,  so  the  process  of  ma¬ 
nipulating  the  viewpoint  is  analogous  to  moving  a  cam¬ 
era  through  space.  Evans,  Tanner,  and  Wein  describe 
viewing  rotation  as  the  single  most  effective  depth  cue, 
even  better  than  stereoscopy  [3].  In  order  for  an  inter¬ 
active  modeling  system  to  give  the  user  a  good  sense 
of  the  three-dimensionality  of  the  objects,  it  is  essential 
that  the  system  provide  a  good  means  of  controlling  the 
viewpoint. 

Control  over  the  viewpoint  is  especially  important 
during  the  direct  manipulation  process,  because  of  the 
need  to  “see  what  you  are  doing.”  The  whole  notion  of 
direct  manipulation  requires  that  the  user  see  what  is 
happening,  and  feel  the  relationship  to  the  movement  of 
the  input  devices.  If  the  user  can’t  see  the  object,  then 
he  or  she  certainly  can’t  manipulate  it  properly. 

Jack  uses  Ware  and  Osborne’s  camera  in  hand 
metaphor[10]  for  the  view.  The  geometric  environment 
in  problems  in  human  factors  analysis  usually  involve 
models  of  human  figures  in  a  simulated  workspace.  The 
most  appropriate  cognitive  model  to  promote  is  one  of 
looking  in  on  a  real  person  interacting  with  real,  life-size 
objects.  Therefore,  Jack  suggests  that  the  controls  on 
the  viewing  mechanism  more  or  less  match  the  controls 
we  have  as  real  observers,  move  side  to  side  and  up  and 
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down  while  staying  focused  on  the  same  point. 

The  viewing  adjustments  in  Jack  are  easy  tp  invoke 
from  within  the  direct  manipulation  process,  and  this  is 
a  yeryxommon  thing  to  do.  The  typical  way  of  perform¬ 
ing  a  manipulation  is  to  intersperse  translations  and  ro¬ 
tations  Avith  viewing  adjustments,  in  order  to  achieve  a 
better  view  during  the  process;  The  context  switch  be- 
tweeri  viewing  and  manipulation  is  very  easy  to  make. 

Automatic  Viewing  Adjustments 

Much  of  this  viewing  adjustment  as  an  aid  to  manipula¬ 
tion  can  be  automated,  in  which  case  the  system  auto¬ 
matically  places  the  camera  in  a  view  which  avoids  the 
problems  of  degenerate  axes.  This  can  usually  be  done 
with  a  small  rotation  to  move  the  camera  away  from 
the  offending  axis.  This  automatic  camera  rotation  can 
even  be  helpful  by  itself,  because  it  provides  a  kind  of 
depth  cue. 

To  prevent  degenerate  movement  axes  from  caus¬ 
ing  problems  during  direct  manipulation.  Jack  uses  a 
threshold  between  the  movement  axis  and  tlvj  line  of 
sight,  beyond  which  it  will  not  allow  the  user  to  ma¬ 
nipulate  an  object.  To  do  so  would  mean  that  small 
movements  of  the  mouse  would  result  in  huge  transla¬ 
tions  or  rotations  of  the  object.  This  value  is  usually 
20®,  implying  that  if  the  user  tries  to  translate  along 
an  axis  which  is  closer  than  20®  to  the  line  of  sight, 
Jack  will  respond  with  a  message  saying  “can’t  trans¬ 
late  along  that  axis  from  this  view,”  and  it  will  not  allow 
the  user  to  do  it.  The  same  applies  to  rotation  around 
axes  perpendicular  to  the  line  of  sight.  In  these  cases, 
the  rotation  wheel  projects  onto  a  line,  so  the  user  has 
no  leverage  to  rotate  it. 

The  automatic  viewing  adjustment  invokes  itself  if  the 
user  selects  the  same  axis  again  after  getting  the  warn¬ 
ing  message.  Jack  will  automatically  rotate  the  camera 
so  that  its  line  of  sight  is  away  from  the  transforma¬ 
tion  axis.  To  do  this,  it  orients  the  camera  so  that  it 
focuses  on  the  object’s  origin,  and  then  rotates  the  cam¬ 
era  around  both  a  horizontal  and  a  vertical  axis,  both 
of  which  pass  through  the  object’s  origin.  The  angles 
of  rotation  are  computed  so  that  the  angular  distance 
away  from  the  offending  axis  is  at  least  20® . 

This  technique  maintains  the  same  distance  between 
the  camera  and  the  object  being  manipulated.  In  gen¬ 
eral,  this  “zoom  factor”  is  much  more  subjective  and  is 
difficult  for  the  system  to  predict.  In  practice,  we  have 
found  it  best  to  require  the  user  to  control  this  quantity 
explicitly. 

The  reason  for  the  repeated  axis  selection  is  to  ensure 
that  the  user  didn’t  select  the  axis  bj  mistake.  It  is 
common  to  position  the  view  parallel  to  a  coordinate 
axis  to  get  a  2D  view  of  an  object.  If  the  user  likes  this 
view,  then  it  would  be  wrong  to  disturb  it.  For  exaurple. 


if  the  user  positions  the  view  parallel  to  the  z  axis  to 
get  a  view  of  the  xy  plane,  and  then  accidentally  hits 
the  right  mouse  button,  the  view  will  not  automatically 
change  unless  the  user  confirms  that  this  is  what  he  or 
she  wants  to  do. 

Automatic  view  positioning  also  takes  place  when  the 
object  is  not  visible.  This  may  mean  that  the  object  is 
not  visible  at  all,  or  only  that  its  origin  is  not  visible. 
For  example,  a  human  figure  may  be  mostly  visible  but 
with  its  foot  off  the  bottom  of  the  screen.  In  this  case,  a 
command  to  move  the  foot  will  automatically  reposition 
the  view  so  that  the  foot  is  visible. 

Smooth  Viewing  Transitions 

Both  the  horizontal  and  vertical  automatic  viewing  ro¬ 
tations  occur  simultaneously,  and  Jack  applies  them  in¬ 
crementally  using  a  number  of  intermediate  views  so 
the  user  sees  a  smooth  transition  from  the  original  view 
to  the  new.  This  avoids  a  disconcerting  snap  in  the 
view.  Jack  applies  the  angular  changes  using  an  ease 
in/ease  out  function  which  ensures  that  the  transition 
is  smooth. 

The  procedure  for  rotating  the  camera  is  sensitive  to 
the  interactive  frame  rate  so  that  it  provides  relatively 
constant  response  time.  If  the  camera  adjustment  were 
to  use  a  constant  number  of  intermediate  frames,  the 
response  time  would  be  either  too  short  if  the  rate  is  fast 
or  too  long  if  the  rate  is  slow.  Jack  keeps  track  of  the 
frame  rate  using  timing  information  available  from  the 
operating  system  in  l/60th’s  of  seconds.  We  compute 
the  number  of  necessary  intermediate  frames  so  that  the 
automatic  viewing  adjustment  takes  about  1  second  of 
real  time. 

Avoiding  Viewing  Obstructions 

When  manipulating  an  object  using  solid  shaded  graph¬ 
ics,  it  can  be  especially  difficult  to  see  what  your  are 
doing  because  of  the  inability  to  see  through  other  ob¬ 
jects.  In  some  situations,  this  may  be  impossible  to 
avoid,  in  which  case  the  only  alternative  is  either  to 
proceed  without  good  visibilitj  or  revert  to  a  wireframe 
image.  Frequently  however,  it  may  be  possible  to  au¬ 
tomatically  change  the  view  slightly  so  that  the  object 
is  less  obstructed.  To  do  this,  we  borrow  an  approach 
from  radiosity,  the  hemicube  [2]. 

The  hemicube  determines  the  visibility  of  an  en¬ 
tire  geometric  env  ironnitnt  from  a  particular  reference 
point,  and  we  can  use  this  information  to  find  an  un¬ 
obstructed  location  for  the  camera  if  one  exists.  We 
perform  the  hemicube  computation  centered  around  the 
origin  of  the  object  being  manipulated,  but  oriented  to¬ 
wards  the  current  camera  location.  This  yields  a  visibil¬ 
ity  map  of  the  entire  environment,  or  what  we  would  see 
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through  a  fish-eye  lens  looking  from  the  object’s  origin 
towards  the  camera.  If  the  camera  is  obstructed  in  the 
visibility  map,  we  look  in  the  neighborhood  of  the  direc¬ 
tion  of  the  camera  for  an  empty  area  in  the  hemicube 
map.  This  area  suggests  a  location  of  the  camera  from 
which  the  object  will  be  visible.  From  this,  we  com- 
pu^  the  angles  through  which  the  camera  should  be 
rotated.  We  generate  the  hemicube  map  using  the  hard¬ 
ware  shading  and  z-buffer,  so  its  computation  is  quite 
efficient. 

This  type  of  hemicube  is  somewhat  different  from  the 
type  used  radiosity  because  it  is  not  necessarily  centered 
around  the  surface  of  an  object.  In  fact,  it  need  not 
be  associated  with  a  surface  at  all,  as  when  the  direct 
manipulation  operation  is  applied  to  a  shapeless  entity 
like  a  3D  control  point  or  a  goal  point  for  an  inverse 
kinematics  operation.  Therefore,  our  hemi-cube  is  ac¬ 
tually  not  “hemi"  at  all,  since  we  use  all  six  sides  of  the 
cube.  In  cases  when  the  direct  manipulation  operation 
is  moving  a  geometric  object,  it  is  convenient  to  omit 
the  object  from  the  hemicube  visibility  computation  al¬ 
together.  Otherwise,  most  of  the  visibility  map  will  be 
filled  up  with  the  object  itself,  even  though  it  is  usually 
quite  acceptable  to  manipulate  an  object  from  a  view 
opposite  its  coordinate  origin. 

In  our  current  implementation,  the  hemicube  main¬ 
tains  only  occlusion  information,  not  depth  information. 
Therefore,  it  will  fail  to  find  suitable  camera  locations 
in  an  enclosed  environment.  In  such  cases,  there  are  no 
holes  in  the  visibility  map  at  all,  although  there  may  be 
regions  only  occluded  by  very  distance  objects.  These 
very  distant  objects  don’t  matter  unless  we  were  con¬ 
sidering  placing  the  camera  very  far  away.  A  better 
approach  would  be  to  retain  depth  information  in  the 
hemicube  and  search  for  a  camera  position  whicli  is  un¬ 
obstructed  only  between  the  camera  and  the  object,  al¬ 
lowing  the  distance  between  the  object  and  the  cam¬ 
era  change  as  necessary,  possibly  causing  the  camera  to 
move  in  front  of  other  objects. 


Conclusion 

The  control  of  a  virtual  camera  is  vitally  important  to 
many  techniques  for  3D  direct  manipulation  system,  al¬ 
though  no  one  has  previously  addressed  the  two  issues 
in  an  integrated  manner.  Our  technique  for  automati¬ 
cally  adjusting  the  view  in  conjunction  with  direct  ma¬ 
nipulation  has  been  implemented,  and  it  is  an  effective 
addition  to  the  manipulation  process.  The  automatic 
viewing  rotations  are  usually  very  small  so  they  do  not 
interject  large  changes  to  the  user’s  view  of  the  geomet¬ 
ric  environment.  Since  the  viewing  adjustments  are  only 
activated  on  the  second  attempt  at  movement  along  a 
degenerate  axis,  the  adjustments  are  seldomly  invoked 


accidentally,  minimizing  the  degree  to  which  the  adjust¬ 
ments  are  inappropriate. 
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ABSTRACT 

This  paper  describes  a  hardware  design  for  antialiasing  both 
lines  and  polygons.  The  hardware  prefilters  lines  and  poly¬ 
gons  defined  on  a  high  resolution  grid  at  one-eighth  the  pixel 
spacing.  The  resolution  of  this  sub-pixel  grid  is  based  on  the 
limits  of  human  visual  perception.  Ibe  antialiasing  filters  can 
extend  over  a  1-by-l  or2-by-2  pixel  domain  forpolygons  and 
a  3-by-3  pixel  domain  for  lines.  The  design  uses  regular 
decomposition  and  the  symmetry  of  antialiasing  filters  to 
minimize  the  size  of  the  filter  tables.  The  resulting  hardware 
is  surprisingly  small  and  very  efficient  (typically  one  cycle  per 
output  pixel).  It  is  therefore  suitable  for  antialiasing  lines  and 
polygons  at  real-time  or  interactive  rates. 

CR  CATEGORIES  AND  SUBJECT  DESCRIPTORS: 

1.3.1  [Computer  Graphics]:  Hardware  Architecture  -  Raster 
display  devices;  1.3.3  (Computer  Graphics]:  Picture/Image 
Generation  •  display  algorithms. 

ADDITIONAL  KEYWORDS  AND  PHRASES: 
Antialiasing,  prefiltcring,  real-time  graphics. 

1 1NTRODUCTION 

Antialiasing  is  a  desirable  feature  for  interactive  graphics,  but 
it  is  not  currently  available  without  cost  or  performance 
compromises.  Although  today’s  workstations  can  draw 
antialiased  vectors  at  high  speed,  rendering  antialiased  poly¬ 
gons  with  workstations  imposes  a  performance  degradation 
proportional  to  the  number  of  samples  per  pixel  calculated  for 
antialiasing  (supersampling).  Antialiased  polygons  at  real¬ 
time  rates  are  available  only  on  flight  simulators,  but  flight 
simulators  have  their  drawbacks.  Flight  simulators  arc  much 
more  expensive  than  workstations  and  they  are  not  general 
purpose  platforms. 

We  begin  with  a  review  of  existing  prefjltering  techniques 
with  an  emphasis  on  those  techniques  potentially  suitable  for 
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hardware.  We  then  describe  our  approach  in  detail.  Finally, 
we  present  some  thoughts  on  how  this  approach  to  antialiasing 
can  be  integrated  with  hidden  surface  removal. 

2  EXISTING  METHODS  FOR  ANTIALIASING 

2.1  Lines 

A  common  technique  for  antialiasing  lines  models  the  geom¬ 
etry  of  a  finite  width  line  and  pixel  by  a  single  parameter  •  the 
distance  from  the  center  of  the  pixel  to  the  center  of  the  line 
[13].  This  single  parameter  is  then  used  as  the  index  into  a 
precomputed  table  of  filter  results  (i.e.,  precomputed  convo¬ 
lutions).  While  this  single  parameter  model  is  exact  suffi¬ 
ciently  far  from  the  endpoints,  it  requires  a  correction  near  the 
endpoints.  Furthermore,  modeling  lines  as  finite  width  enti¬ 
ties  causes  other  problems  near  endpoints.  When  lines  are 
connected  end-to-end,  overlapping  can  cause  Intensity  errors 
at  endpoints  (see  Fig.  1). 


Fig.  1.  Overlapping  with  different  line  endpoint  shapes 
-  the  original  line  (a),  with  cut-off  (b),  rectangular  (c), 
and  rounded  (d)  endpoints. 

One  solution  to  these  overlapping  effects  is  to  solve  the  hidden 
surface  problem  forfinite  width  lines  [28].  Alternatively  lines 
can  be  modeled  as  infinitely  thin,  so  that  the  antialiasing  filter 
itself  gives  thickness  to  the  line  and  “shape”  to  the  endpoints. 
Accurate  handling  of  endpoints  is  crucial  for  rendering  curves 
using  a  sequence  of  short  line  segments  [19]. 

2.2  Polygons 

Polygons  are  commonly  antialiased  using  supersampling  [8]. 
Unfortunately,  with  supersampling  the  number  of  samples 
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required,  to  eliminate  aliasing  artifacts  is  significant.  Al¬ 
though  non-uniform  sampling  requires  fewer  samples  than 
reguto  ^plihg,  even  with  optimized  non-uniform  sampling 
the  humber.of  ^ples  per  pixel  necessary  for  high  quality  is 
on  the  order  of  25  ([17])  to  40  ([14]).  This  forces  either  a 
pdiformance  degradation  with  respect  to  aliased  rendering  (as 
with  workstations)  or  more  hardware  (as  in  flight  simulators). 

The  alternative  to  supersampling  is  prefiltering.  To  place  the 
twhnique  described  here  in  perspective,  we  review  several 
prefiltering  methods  published  in  the  literature. 

CaUnull  [4]  introduced  the  technique  of  calculating  visible 
area  as  a  method  of  antialiasing.  This  area  calculation  is 
straightforward  and  relatively  fast  Unfortunately,  using 
visible  area  forantialiasing  isequivalcnt  to  using  an  unweighted 
filter,  and  unweighted  filters  arc  far  from  the  optimal  filter 
shape,  [16] 

Feibush,  Levoy  and  Cook  [7]  described  a  method  for 
prefiltcring  polygon  edges,  Their  method  begins  by  clipping 
a  polygon  to  the  filter  domain  surrounding  each  pixel.  The 
resulting  clipped  polygon  is  decomposed  into  several  right 
triangles,  two  for  each  edge  in  the  clipped  polygon.  The 
filtered  contribution  for  each  of  these  triangles  is  obtained 
from  a  small  table,  and  the  individual  contributions  arc  accu¬ 
mulated  (with  sign)  to  yield  the  final  filter  result.  They 
menUon  two  extensions;  one  for  filters  that  arc  not  circularly 
symmetric  (a  three  parameter  table),  and  one  that  uses  the 
coordinates  of  edge  endpoints  to  look-up  the  filter  result  with 
one  table  access  per  edge  instead  of  two  (a  four  parameter 
table).  The  drawbacks  of  this  approach  are  that  it  is  oriented 
toward  circularly  symmetric  filters  and  that  it  is  slow.  For  the 
basic  two  parameter  table,  it  requires  calculations  to  deter¬ 
mine  the  base  and  height  of  the  right  triangles  into  which  the 
clipped  polygon  is  decomposed  followed  by  several  table 
look-ups  (at  least  six).  The  four  parameter  table  would  require 
a  table  as  large  as  that  used  in  our  method,  yet  there  are  still 
several  table  look-ups  per  pixel.  In  contrast,  the  method  we 
propose  requires  only  simple  reflections  on  the  clipped  poly¬ 
gon  fragments  and  typically  only  one  table  look-up  per  pixel. 

Abram,  Westover,  and  Whitted  [1]  proposed  a  method  requir¬ 
ing  only  one  look-up  for  many  cases.  After  clipping  the  visible 
portion  of  a  polygon  to  the  fdter  domain  surrounding  a  pixel, 
they  directly  look-up  the  result  for  cases  where  no  edge 
endpoint  is  within  the  filter  domain.  When  one  or  more 
vertices  lie  within  the  filter  domain,  the  method  reverts  to 
using  sub-pixel  bit  masks.  While  it  is  claimed  that  this  causes 
only  an  “unnoticeable  degradation”  at  64  sub-pixel  mask  bits 
per  pixel,  it  docs  require  perturbing  the  table  vdues  and  using 
sub-pixel  bit  masks.  In  contrast  our  method  decomposes 
polygons  into  fragments  whose  shape  is  well-constrained  (so 
there  are  no  special  cases)  and  it  explicitly  allows  vertices  to 
lie  within  the  filter  domain. 

Lobb  [  1 5]  described  a  method  for  prefiltering,  restricted  again 
to  filters  possessing  circular  symmetry.  His  method  is  much 
like  line  antialiasing  in  that  it  is  simple  and  exact  sufficiently 
far  from  vertices.  The  filter  response  near  a  vertex  is  approxi¬ 


mated  with  a  claimed  error  of  less  than  4%. 

Duff  [6]  extended  the  trapezoidal  decomposition  generally 
used  to  calculate  area  for  antialiasing  to  a  method  for  comput¬ 
ing  the  exact  (to  floating  point  accuracy)  convolution  for  non- 
uniform  filters  (particularly  polynomial  splines).  He  men¬ 
tions  the  possibility  of  storing  convolution  results  in  tables. 
This  would  require  one  to  three  table  look-ups  while  our 
approach  requires  only  one.  Still,  his  method  is  very  efficient 
as  a  software  algorithm. 

Schilling  [25]  described  a  method  which  uses  tables  to  deter¬ 
mine  sub-pixel  masks.  Schilling  bases  his  table  look-up  upon 
edge  slope  and  distance  from  pixel  center,  although  other 
parameters  such  as  intersection  location  along  the  filter  do¬ 
main  boundary  (as  [1]  used)  arc  equivalent.  The  unique 
feature  of  Schilling’s  method  is  that  the  table  turns  on  sub¬ 
pixel  bits  according  to  polygon  area  rather  than  explicit 
geometric  coverage  of  the  sub-pixel  sampling  point.  Conse¬ 
quently,  some  of  the  bits  turned  on  arc  outside  the  polygon! 
The  mask  for  a  convex  polygon  is  the  logical  AND  of  the 
masks  for  each  of  its  edges.  The  effect  of  ANDing  masks  that 
have  samples  outside  the  polygon  was  not  discussed.  This 
method  is  similar  to  a  technique  used  in  some  flight  simula¬ 
tors,  in  which  edge  parameters  are  used  to  look-up  the  sub- 
pixel  mask  for  non-uniform  sampling.  Fig.  2  diagrams  this 
technique.  Any  two  of  the  four  parameters  (slope,  distance  to 
pixel  center,  x-intcrcept,  y-intcrcept)  can  serve  as  the  index 
into  a  table  containing  the  mask  for  an  edge. 


Fig.  2.  Table  look-up  of  sub-pLxel  mask  bits 


Overall,  none  of  the  existing  methods  is  entirely  satisfactory. 
Each  has  one  or  more  drawbacks: 

•  Filter  shape  is  restricted  to  unweighted  or  circularly 
symmetric  functions  ([4],  [7],  [15],  [25]). 

•  The  method  is  slow,  requiring  several  look-ups  per  pixel 

or  other  calculations  ([4],  [7],  [15],  [6]). 

•  The  method  is  approximate  near  vertices  ([1],  [15], 
[25]). 

•  The  method  is  simply  a  way  to  maintain  sub-pixel  masks 

([1].[25]). 

The  first  drawback  is  important  because  an  unsveighted  filter 
leaves  too  many  residual  artifacts,  and  a  circularly  symmetric 
filter  can’t  provide  uniform  total  field  response  [17]  (also 
called  the  constant  energy  criteria  [29]  or  zero  sampling 
frequency  ripple  [16]).  The  last  drawback  is  important  be¬ 
cause  the  number  of  bits  needed  in  a  sub-pixel  mask  is  at  least 
32  for  high  quality  antialiasing.  To  store  an  arbitrary  filter 
function  in  a  table  that  can  be  indexed  by  sub-pixel  mask 
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becomes  difncult  for  high  resolution  masks  because  the  table 
si%  grows  exponentially  with  the  number  of  sub-pixel  masks 
bits  (16  bits  require  a  64K  word  table,  but  32  bits  require  a  4 
Giga-word  table!).  Furthermore,  since  prefiltering  must  be 
us^  for  lines,  resorting  to  sub-pixel  masks  forces  line  and 
polygon  antialiasing  to  be  somewhat  inconsistent.  In  contrast 
the  method  proposed  here  offers: 

•  Arbit^  filters  defined  over  large  domains. 

•  High  image  qiudity  (from  the  1/8  pixel  grid  resolution). 

•  H^ware  speed  (one  table  look-up  per  output  pixel). 

•  Hardware  simplicity. 

•  Uniform  antialiasing  of  both  lines  and  polygons. 


3  HARDWARE  FOR  PREFILTERING 

Our  hardware  implementation  of  prefiltering  allows  arbitrary 
filters  defined  over  a  1-by-l  or  2-by-2  pixel  domain  for 
polygons  and  a  3-by-3  domain  for  vectors  (Fig.  3).  Input 
primitives  are  described  by  the  polygon  vertices  or  line 
endpoints.  These  points  are  specified  to  a  resolution  of  1/8  the 
pixel  spacing.  Intersections  (for  clipping)  of  lines  orpolygon 
edges  with  the  boundaries  of  filter  domtuns  are  caleulated  to 
the  same  accuracy.  The  high  resolution  of  this  grid  is  impor¬ 
tant.  At  lower  resolutions  many  implementations  are  possible 
but  they  can't  reatit  in  high  quality  antialiasing.  This  resolu¬ 
tion  was  chosen  because  it  matches  the  ability  of  human  visuali 
perception  to  infer  sub-pixel  position  from  greyscale  (i.e., 
antialiasing)  information  ( 1 8] .  By  comparison,  prcfiltcring  a  t 
a  resolution  of  1/8  pixel  is  more  accurate  than  supersamplin;;! 
on  an  8-by-8  regular  grid  (64  samples  per  pixel).  Th  :> 
consequences  of  using  a  coarser  grid  (or  fewer  samples  pt  r 
pixel)  have  been  discussed  in  a  previous  paper  [11]. 


(3)  Encoding  of  intersection  shape. 

For  lines,  the  decomposition  stage  simply  orders  the  end¬ 
points  in  top-to-bottom  order.  For  polygons  the  decomposi¬ 
tion  is  a  conventional  trapezoidal  decomposition  of  input 
polygons.  The  second  step  clips  the  input  line  or  trapezoids 
from  the  polygon  decomposition  to  the  filter  domain  at  each 
pixel.  It  organizes  the  process  of  intersection  calculation  as  a 
sequence  of  interpolations  along  edges.  Interpolations  are 
calculated  only  once  and  shared  between  neighboring  filter 
domains.  The  third  step  reduces  the  plethora  of  clip^d  line 
segments  and  polygon  fragments  into  a  set  small  enough  for 
direct  table  look-up.  It  uses  the  horizontal  and  vertical 
symmetry  of  antialiasing  filters  to  reflect  clipped  fragments 
into  a  canonical  form.  This  reduces  the  table  size  for  lines  by 
a  factor  of  8,  and  for  polygons  by  a  factor  of  4. 

3.1  Decomposition  and  Clipping  of  Lines 

The  decomposition  stage  for  lines  simply  orders  the  endpoints 
in  top-to-bottom  order.  Chained  lines  arc  broken  up  into 
groups  of  linesegments  thatcan  be  processed  in  top-to-bottom 
order. 

The  result  of  clipping  a  line  to  a  filter  domain  is  clearly  a 
segment  of  the  original  line.  If  the  filter  domains  were  single 
pixels,  the  clipping  would  consist  of  simply  slicing  the  line  up 
into  pieces,  each  of  which  lies  on  a  single  scanlinc,  and  then 
slicing  those  pieces  horizontally  for  each  pixel.  However,  the 
filter  domain  for  antialiascd  lines  is  larger  than  a  single  pixel 
so  that  the  filter  domains  overlap.  Therefore,  the  clipping 
process  must  take  this  into  account.  Fig.  4  shows  how  the 
clipping  in  the  y-dircction  works  for  the  3-by-3  pixel  domain 
for  antialiascd  lines. 


Fig.  3.  Filter  domains  for  lines  and  polygons. 

To  use  tables  for  prefiltering  requires  transfonning  input 
primitives  into  a  form  simple  enough  to  allow  the  filter  table 
to  be  of  reasonable  size.  Our  approach  is  comprised  of  three 
steps  for  both  lines  and  polygons: 

(1)  Regular  decomposition. 

(2)  Efficient  clipping  to  the  filter  domains. 


Fig.  4.  Vertical  clipping  for  lines. 

Fig.  4  shows  a  line  from  a  sub-pixel  location  in  scanline  B  to 
one  in  scanline  F.  The  x  and  color  parameters  of  the  original 
line  are  interpolated  (as  a  function  of  y)  at  the  boundaries 
between scanlinesB through F.  Thenaseriesoflinesegi.  .nls 
is  created  from  the  original  line  endpoints  and  the  interpolated 
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^in^.  One  segment  is  created  for  all  the  filter  domains  on 
each  sc^line  (as  shown  at  the  right  of  the  Fig,  4).  This 
construction  is  equivalent  to  sliding  a  horizontal  band  3 
sc^lines  high  down  over  the  original  line  and  noting  the 
intersection  when  the  band  is  centered  on  a  scanline. 

O^rationally,  the  hardware  creates  a  stream  of  points  in  the 
followmg  order:  the  top  point  of  a  line  segment,  an  interpo¬ 
lated  point  at  the  boundary  between  the  topmost  scanlines  (B 

and  C  for  the  example  in  Fig.  4) . an  interpolated  point  at 

the  boundary  between  the  bottom-most  scanlines  (E  and  F), 
and  the  bottom  point.  This  stream  of  points  flows  through  a 
register  followed  by  a  variable  depth  pipeline  that  together 
reconstruct  the  clipped  segments.  The  output  of  the  register  is 
the  bottom  point  for  the  filter  domain  on  a  scanline  while  the 
output  of  the  variable  depth  pipeline  is  the  corresponding  top 
point  (Fig.  5). 


_J 

r~ 

— 

Input 

point 

stream 

varlabla 

dapth 

pipallna 

register 

> 

> 

' 

top  bottom 

point  point 


Fig.  5.  Reconstruction  of  overlapping  segments. 

The  clipping  process  in  the  x-direction  operates  on  these  new 
segments  in  an  analogous  fashion  (y  and  color  interpolated  as 
a  function  of  x).  The  final  result  is  the  intersection  of  the 
original  line  with  its  occupied  filter  domains. 

3.2  Decomposition  and  Clipping  of  Polygons 

Conventional  (aliased)  polygon  scan  conversion  interpolates 
polygon  edges  in  one  direction  to  produce  a  set  of  imbedded 
lines  and  then  interpolates  along  these  lines  (in  tlie  perpen¬ 
dicular  direction)  at  the  center  of  each  pixel  [2].  This  gener¬ 
ates  the  set  of  points  that  lie  within  the  original  polygon  (left 
of  Fig.  9).  In  contrast,  the  antialiased  scan  conversion  method 
presented  here  retains  the  shape  of  the  original  polygon  within 
each  filter  domain.  That  is,  it  clips  polygons  (actually  trap¬ 
ezoidal  slices  of  polygons)  to  the  filter  domain  surrounding 
each  pixel.  This  decomposition  and  clipping  is  nonetheless 
simito  to  conventional  scan  conversion.  As  we  saw  with 
lines,  this  process  is  a  sequence  of  interpolations  on  the  point 
data  (x,  y,  color,  opacity,  texture  co-ordnates,  etc.)  that  defines 
the  original  line  or  polygon. 

Polygon  decomposition  and  clipping  occurs  in  three  stages. 
First  arbitrary  polygons  (any  number  of  sides,  convex  or 
concave,  and  with  or  without  holes)  are  decomposed  into 


horizontally  aligned  trapezoids.  Secondly,  these  trapezoids 
are  clipped  in  the  y-direction  into  smaller  trapezoids  within 
the  filter  domains  on  only  a  single  scanline.  Tliis  clipping  is, 
like  the  edge  clipping,  a  series  of  interpolations  followed  by 
reconstruction  using  hardwired  registers.  Then  a  similar 
clipping  occurs  in  the  x-direction.  Usually  this  clipping 
results  in  a  piece  with  just  one  edge  per  filter  domain.  If  there 
are  two  edges  in  the  filter  domain,  the  piece  is  represented  as 
the  difference  of  two  single-edged  pieces. 


Fig.  6.  Trapezoidal  decomposition  example. 

Fig.  6  shows  an  example  of  the  decomposition  of  a  polygon 
into  a  set  of  horizontally  aligned  trapezoids  (which  can  degen¬ 
erate  into  triangles).  The  resulting  trapezoids  are  defined  by: 

(1)  Left  and  right  edges 

(2)  A  vertical  extent  deti  rmined  by  the  endpoints  of  the 
left  and  right  edges  and  possibly  a  concave  minimum 
(at  the  top  of  region  3  in  Fig.  6)  or  a  concave  maximum 
(at  the  bottom  of  region  4  in  Fig  6): 

Vmax  —  min  {  yieA-top.  yright-top.  yconcave-min  ) 
ymin  =  max  {  yieft-bottom>  yrigln-boitom>  yconcave-max  ) 

Operationally,  this  trapezoidal  decomposition  creates  two 
sequences  of  points,  one  for  the  left  bounding  edges  and  one 
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for  the  right.  Note  that  the  top  and  bottom  horizontal  edges  are 
implicitly  defined.  The  actual  vertices  of  the  trapezoid  are  not 
calculate  until  the  clipping  in  y  takes  place.  Algorithms  for 
performing  this  decomposition  on  arbitrary  polygons  (includ¬ 
ing  concave  and  with  holes)  are  known  [3]. 


Fig.  7.  Types  of  polygon  spans. 


between  scanlines  and  opposite  every  encountered  vertex 
(c.g.,  vertex  1  in  Fig.  9).  Because  interpolations  are  shared 
tetween  neighboring  scanlines,  the  number  of  y-interpola- 
tions  is,  to  first  order,  the  same  in  either  case.  Thus  no 
additional  calculation  is  needed  for  the  clipping  in  y  required 
for  antialiasing.  The  final  stage,  clipping  in  the  x-direction,  is 
performed  only  on  color  in  the  conventional  case,  while  in  the 
antialiasing  case,  y  also  needs  to  be  interpolated  at  the  vertical 
boundaries  between  filter  domains  ( the  4  points  indicated  by 
the  small  anows  in  Fig.  9).  This  interpolation  needs  to  be 
accurate  to  only  5  bits  since  y  is  within  a  filter  domain.  In 
summary,  the  antialiasing  decomposition  and  clipping  is  of 
about  the  same  difficulty  as  the  simpler  interpolations  for 
aliased  scan  conversion.  Antialiasing  additionally  requires 
only  a  low-resolution  interpolation  of  y  as  a  function  of  x,  and 
registers  for  reconstructing  the  data  describing  the  clipping 
result. 


Once  a  polygon  is  decomposed  into  trapezoids,  the  left  and 
right  edges  are  clipped  in  y  (analogously  to  the  line  clipping 
described  earlier)  producing  a  set  of  smaller  (vertically) 
trapezoids.  These  smaller  trapezoids  are  called  spans.  For 
polygons,  the  size  of  the  filter  domain  is  either  a  single  pixel 
or  2  pixels  high  (in  which  case  the  interpolation  points  are  at 
Ute  middle  of  a  scanline  instead  of  the  bound^  between 
scanlines).  Fig.  7  shows  the  types  of  spans  that  can  result  from 
clipping  in  y  (with  the  single  pixel  filter  domain  shown  for 
clarity).  A  polygon  span  often  has  its  left  and  right  edges 
separated  in  x  while  the  extent  of  the  span  y  covers  the  entire 
fiiter  domain  (Fig.  7a).  The  left  portion  of  the  span  just 
contains  the  left  edge,  the  middle  has  no  edges  and  the  right 
portion  has  Just  the  right  edge.  At  the  top  or  bottom  of  a 
trapezoid  from  the  polygon  decomposition,  the  extent  in  y 
doesn’t  necessarily  cover  the  whole  filter  domain  (Fig.  7b). 
Lastly,  a  filter  domain  can  contain  two  edges  near  vertices  or 
for  very  thin  polygons  (Fig.  7c).  When  two  edges  arc  present 
(which  occurs  only  a  small  percentage  of  the  time)  the  filter 
result  is  computed  as  the  difference  for  each  individual  edge 
p^ig.  8).  The  case  of  two  edges  per  pixel  is  the  only  shape  that 
is  not  handled  in  a  single  cycle. 


Fig.  8.  Handling  domains  with  2  edges. 

Fig.9  compares  conventional  scan  conversion  Oeft)  and  the 
antialiasing  decomposition  and  clipping  (right)  for  the  single 
pixel  domain  (again,  for  clarity).  In  the  y-direction,  both 
require  5  interpolations  -  the  only  difference  is  that  in 
conventional  scan  conversion  the  interpolations  are  at  the 
middle  of  scanlines,  while  in  the  antialiasing  case  they  are 
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Fig.  9.  Conventional  (left)  and  antialiasing  (right) 
decomposition. 
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3.3  Line  Encoding 


for  X2  and  5  bits  for  for  direct  addressing  of  a  128K-word 

table. 


Clipping  a  line  to  the  3-by-3  pixel  filter  domain  yields  a 
(usually  shorter)  line  segment.  Direct  addressing  of  a  filter 
table  using  the  coordinates  of  the  endpoint^'  of  this  clipped  line 
would  require  a  table  whose  size  is  much  too  large.  The  x  and 
y  coordinates  of  each  endpoint  take  on  a  possible  25  values  (3 
pixels  at  1/8  pixel  resolution).  A  straightforward  translation 
from  geometry  to  table  address,  i.e., 

ADDR  =  XI  +  25  yi  +  (25)2  X2  +  (25)3  yz 
requires  25'* =380,625  table  entries  or  a  half-mcgaword  tablel 

Our  solution  is  to  encode  these  endpoints  using  conditional 
reflections  and  followed  by  bit-packing.  The  reflections 
assume  only  that  the  antialiasing  filter  possesses  x-  and  y- 
symmetry.  Given  the  result  of  clipping  a  line  to  a  filter 
domain,  the  endpoints  are  conditionally  reflected  about  filter 
symmetry  axes  so  that  one  of  the  endpoints  always  lies  in  a 
small  region.  In  particular,  given  two  endpoints  (xi,yj)  and 
i!s2.y2).  5ie  reflected  line  always  has  point  (xi,yi)  lying  in  a 
particular  octant.  The  reflections  arc  done  in  two  stages  as 
shown  in  Fig.  10.  First  both  points  are  reflected  across  the  y* 
axis  if  xt  >  0,  and  simultaneously  the  x-axis  if  yj  >  0.  The 
second  stage  reflects  the  result  across  the  line  x  =  y  if  the 
reflected  xt  <  yt.  This  sequence  of  conditional  reflections 
forces  point  1  to  end  up  in  the  shaded  region  shown  in  Fig.  10. 
For  the  3-by-3  pixel  domain  at  1/8-pixcl  resolution  this 
encoding  yields  91  possible  values  for  (xt,y i)  and  625  values 
for  (X2,y2)  for  a  total  of  56,875  possible  '•ascs  (a  reduction  by 
almost  a  factor  of  eight). 


Fig.  10.  Two-stage  reflections  for  line  encoding. 


3.4  Polygon  Encoding 

Analogous  to  line  encoding,  polygon  encoding  consists  of 
conditional  reflections  followed  by  bit-packing.  Note  that  the 
material  of  a  polygon  lies  to  the  right  of  a  left  edge,  and  to  the 
left  of  a  right  edge.  Thus  if  the  antialiasing  filter  is  symmetric 
about  the  y-axis,  a  right  bounding  edge  can  be  reflected  into 
a  left  edge  (Fig.  12a).  Likewise,  y-sy  mmeuy  allows  reflection 
across  the  filter  domain’s  x-axis  so  that  the  slope  of  an  edge  is 
always  positive  (Fig.  12b). 


(a) 

(b) 

Fig.  12.  Conditional  reflections  on  edge-domain 
intersections. 

These  positive-sloped  left-edge  regions  of  a  filter  domain  are 
called  fragments.  A  fragment  is  primarily  defined  by  the  two 
endpoints  of  the  edge  inside  or  on  the  boundary  of  the  filter 
domain.  In  addition,  a  fragment  can  posses  an  optional  third 
y  value,  Yg  (Fig.  13),  which  arises  when  the  span  doesn’t 
cover  the  entire  vertical  extent  of  the  filtering  domain  (recall 
Fig.  7b).  Because  of  the  reflections,  Yg  for  a  fragment  will 
always  be  below  the  main  edge. 


Fig.  13.  Optional  y-values  associated  with  an  edge. 


Fig.  11.  Bit-packing  point  1. 

After  the  conditional  reflection,  the  co-ordinates  for  point  1 
arepackedintoa7-by-13rcctangle(Fig.  11).  Ihe  conditional 
reflections  and  packing  of  point  I’s  coordinates  are  easily 
done  in  small  PALs.  The  table  address  could  be  computed  as 
ADDR  =  XI  -t-  7  yi  -h  7(13)  X2  7(13)(25)  y2 
for  look-up  in  a  64K  word  table.  However,  we  prefer  to  simply 
concatenate  the  co-ordinates  (3  bits  for  xi,4  bits  for  yi,  5  bits 


Our  concept  of  fragment  differs  from  that  of  others  in  that  we 
explicitly  allow  edge  endpoints  to  lie  within  the  filter  domain. 
In  addition,  we  include  the  global  y-value  for  efficiency  in 
scan  conversion  (otherwise  these  cases  would  take  two  cycles, 
and  they  occur  a  significant  percentage  of  the  time). 
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A  taxonomy  of  fragment  geometry  (with  the  final  address 
from  bit-packing)  can  be  organized  into  six  cases  as  shown  in 
Fig.  14: 

Case  1 .  Edge  at  left  or  no  edge  in  domain. 

Case  2.  Edge  crossing  (ton-to-bottom). 

Case  3.  Edge  crossing  (left-to-right). 

Case  4.  Edge  crossing  (comer). 

Cases.  One  point  inside. 

Case  6.  Both  points  inside. 

These  cases  are  presented  primarily  for  comparison  with 
previous  methods.  The  bit-packing  follows  directly  from  the 
definition  of  a  fragment,  not  from  the  explicit  consideration  of 
each  of  these  cases. 

For  aresolution  of  1/8  the  pixel  spacing,  all  possible  fragments 
are  encoded  into  13  bits  for  a  single  pixel  filter  domain  or  17 
bits  for  the  2-by-2  pixel  domain.  The  following  description  is 
for  the  single  pixel  case,  the  2-by-2  pixel  case  differs  only  in 
that  4  bits  ore  allocated  for  coordinates  instead  of  3  bits. 


(O.Xb.Yb.Xt.Yt) 

Fig.  14.  Fragment  taxonomy  and  their  encouings. 


Fig.  15.  Allowable  values  for  fragment  coordinates. 

The  fragment  encoding  is  given  the  top  and  bottom  points  of 
the  fragment’s  edge,  (Xiop.Ytop)  and  (Xbot.Ybot)i  and  possibly 
Yp.  Initially,  these  coordinates  arc  defined  on  a  pid  and  can 
lakeon  the  values  Otlirough  8  for  the  1-by-l  domain  (or  values 
0  through  16  for  die  2-by-2  domain).  The  constraints  that 
allow  efficient  bit  packing  arc: 

(1)  Since  Ytop  cannot  be  0  (Fig.  16b)  and  Yboi  cannot  be 
8  (Fig.  i6a),  Ybot  and  Ytop  are  encoded  in  3  bits  by 
having  0  signify  different  positions  for  the  top  and 
bottom  y  (Fig.  15). 

(2)  Similarly,  since  edge  slope  is  positive  Xbot  cannot  he 

8  (Fig.  1  &),  and  so  Xbot  also  requires  only  3  bits  (Fig. 
15). 

(3)  Because  the  edge  slope  ts  positive,  then  Xtop  >  Xbot. 
Thus  when  Xbot  >  0,  Xtop  =  8  can  be  represented  by 
the  value  0. 
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(a)  (b)  (c) 


Fig.  16.  Polygon  bit*packing  constraints. 


All  that  rcmmns  is  to  encode  (if  present)  and  the  case  Xtop 
s  8  with  Xbot  =  0.  This  requires  one  additional  signal  bit 
When  Yg  occurs,  we  must  have  Xbot = 0.  Thus  in  either  case 
(Xtop  =  0  and  Xbot  0,  or  Yg  present),  the  signal  bit  implies 
that  Xbot  is  zero,  Tliis  frees  up  the  bits  normally  used  for  Xbot. 
Since  Yg  <  Ybot.  this  allows  substituting  Yg  for  Xbot  when  Yg 
is  necessary,  or  substituting  Ybot  for  Xbot  tor  the  case  Xtop  = 
8  with  Xbot  -  0.  This  bit-packing  can  be  summarized  in  the  C 
programming  language  as  follows; 

Xtop  s  Xtop  %  8; 

Ytop  =  Ytop%8; 
if  (Yg  is  not  needed)  { 
if(Xtop==8&&Xbot==0) 

ADR  =  (l,Ybot,Ybot,0,Ytop): 
else 


ADR  =  (0,Xbot,Ybot,Xtop,Ytop); 

)  else 

ADR  -  (l,Yg,Ybot,Xtop,Ytop): 

When  there  is  no  edge  in  the  domain,  the  encoding  defaults  to 
(0, 0,  Ybot.  0,  Ytop).  This  bit-packing  requires  a  total  of  1  +  3 
+  3  +  3  +  3  =  13  bits  for  the  signal  pixel  filter  domain,  and 
similarly  17  bits  for  the  2-by-2  pixel  domain.  This  is  not  the 
mist  compact  packing  possible  (there  are  69,632  cases  for  the 
2-by-2  pixel  domain),  but  for  memory  sizes  in  powers  of  two 
it’s  good  enough. 


3.5  Alpha-Blending  In  the  Frame  Buffer 

For  both  lines  and  polygons  the  filter  weight  obtained  from  the 
table  is  multiplied  by  the  opacity  ( jpacity  =  1  -  transparency) 
of  the  polygon  or  line,  and  the  r  esult  is  called  a  This  a 
controls  the  blending  of  the  new  coi'jr  with  the  existing  color 
(or  the  background  color)  in  the  frame  buffer.  Color  can  be 
blended  according  to  the  rules  of  compositing  [22]: 

CpB  =  Ok  *  Cln  +  (l-Ow)  *  CpB 
Ofb  =  0(N  *  On  +  (1-On)  *  OfB 

where  C  stands  for  any  color  component,  and  the  subscripts 
FB  and  IN  designate  the  current  frame  buffer  contents  and  the 
input  values  coming  into  the  frame  buffer  respectively.  This 
is  used  primarily  for  rendering  transparent  surfaces  in  back- 
to-front  order  and  for  lines.  There  is  also  a  mode  where  the 
pixel  can  be  “fiPed”  until  it  is  “full”: 


a=  min(c^,  I-Ofb) 

CpB  -Cl*  CiN  +  (l-O)  *  CpB 
Ore  —  Ore  +  ot 

This  iscalled  the  image  accumulation  mode,  and  it  is  primarily 
used  for  ’•\ndcring  polygons  in  front-to-back  order. 

3.6  Hardware  Details 

Ttic  system  as  a  whole  consists  of  three  9U-slzed  VME  circuit 
bofjrds  built  with  off-the-shelf  TTL  and  CMOS  parts.  At 
present  no  ASICs  are  used.  The  first  board  performs  the 
decomposition  and  clipping  in  the  y-dircction.  The  second 
board  docs  the  clipping  in  the  x-dircction,  Gouraud  shading, 
symmetry  encoding,  filter  table  access,  and  alpha  calculation. 
*^0  third  boai’d  contains  the  frame  buffer,  alpha  blending,  and 
video  output 

All  encoding  is  done  in  a  layer  of  PAL  logic,  and  the  resulting 
tables  fit  in  a  1  Mbit  RAM  for  cither  lines  or  polygons.  The 
required  hardware  is  surprisingly  small.  Fig.  17  compares  the 
module  that  implements  the  encoding  logic  and  filter  tables 
with  a  similar  module  for  Gouraud  shading.  While  both 
modules  are  about  the  same  size,  the  Gouraud  module  is 
double-sided  while  the  encoding-filter  module  is  only  single- 
sided. 

The  system  runs  at  20  MHz  and  fills  up  to  4  pixels  in  every 
clock  cycle.  This  results  in  a  polygon  tliroughput  (when  used 
with  a  front-end  performing  hidden  surface  removal)  of  600 
thousand  antialiascd  polygons  per  second  (interlaced)  or  300 
thousand  polygons  (non-interlaced)  for  a  1-mcgapixcl  dis¬ 
play.  The  line  throughput  (averaged  over  all  orientations)  is 
2  million  antialiascd  10-pixcl  vectors  per  second.  This  perfor¬ 
mance  is  competitive  with  larger  systems  requiring  custom 
VLSI. 


Fig.  17.  Encoding-filter  and  Gouraud  modules. 
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4  HIDDEN  SURFACE  REMOVAL 


4.1  Priority  Methods 


Su^rsampling  has  the  advantage  that  it  integrates  easily  with 
point  sampling  methods  for  hidden  surface  removal  such  as  z- 
buffering  or  ray  tracing.  Because  antialiasing  requires  hidden 
surface  remov^,  we  must  address  just  how  the  hidden  surface 
problem  might  be  solved  given  that  the  point  sampling  para¬ 
digm  (z-buffer  or  ray  tracing)  has  been  abandoned.  A  full 
consideration  of  this  issue  is  teyond  the  scope  of  this  paper. 
Our  goal  here  is  simply  to  suggest  that  useful  hidden  surface 
techniques  already  exist,  and  there  is  room  for  even  better 
ones.  We  will  briefly  consider  two  approaches:  priority  based 
rendering  and  a  hardware  implementation  of  a  scan-line 
hidden  surface  algorithm. 


C^se3 


Case  4 


Cases 


One  way  to  employ  this  antialiasing  method  is  to  render  front¬ 
facing  polygons  in  front-to-back  order  using  the  image  accu¬ 
mulation  mode  described  in  Section  3.5.  The  resulting  image 
is  correctly  rendered  at  all  pixels  except  those  that  contain 
contours  from  three  or  more  overlapping  relevant  surfaces 
[26]  (a  tcievant  surface  is  a  forward-facing  connected  set  of 
polygonsorothersurfaceelements).  Let’s  explicitly  consider 
several  cases  to  uderstand  where  it’s  exact  and  where  it  errs 
(Fig.  18). 

Case  1 ,  where  there  is  one  polygon  in  the  domain,  is  trivially 
seen  to  be  correctly  filled.  Case  2,  where  there  are  no  visible 
contour  edges  in  the  domain,  is  also  correctly  filled  because  all 
the  pieces  will  simply  add  up  to  unity.  Case  3,  where  there  is 
only  one  contour  (which  may  consist  of  more  than  one  edge) 
and  therefore  only  two  visible  relevant  surfaces,  is  also 
correctly  rendered.  The  front-most  surface  is  accumulated  in 
the  pixel  and  the  back-most  surface  “fills"  the  remainder  of  the 
pixel.  This  assumes  that  the  shading  gradients  for  the  back¬ 
most  surface  are  small,  but  the  error  in  shading  is  inversely 
proportional  to  the  visible  area  of  the  backmost  surface.  Thus, 
the  shading  error  can  be  large  only  when  the  total  contribution 
of  the  baclOTost  surface  is  small.  Case  4,  where  there  are  two 
non-intersecting  contours,  and  the  two  front-most  surfaces  do 
not  overlap  is  also  rendered  correctly  by  the  same  logic.  The 
only  errors  occur  for  Cases  5  and  6,  where  there  are  three  or 
more  visible  surfaces  and  the  first  two  overlap.  Clearly  these 
two  cases  occur  only  for  a  few  pixels  in  most  scenes.  Further¬ 
more,  the  front-most  surface  is  always  weighted  correctly. 
This  has  led  Crow  to  conclude  that  the  errors  in  Uiis  approach 
have  "not  proven  to  be  noticeable  in  practice”  [5], 

Plate  1  shows  an  example  of  front-to-back  rendering  at  a 
resolution  of  1024  by  768  pixels.  Methods  for  determining 
polygon  priority  cither  a  priori  [10]  or  on-the-fly  are  well 
known.  'The  advantage  of  priority  methods  is  that  the  time  to 
input  polygonal  data  is  proportional  to  the  number  of  polygon 
vertices  and  the  time  to  render  is  proportional  to  the  total  area 
(visible  or  not)  of  all  polygons  within  the  viewport.  Tire 
disadvantage  is  that  the  rendering  time  is  also  proportional  to 
the  average  depth  complexity  of  the  scene. 

4.2  A  Scan-Line  Algorithm 


a  Front-most  visible  surface 


a  Second  front-most  visible  surface 


a  Third  front-most  visible  surface 


Fig.  18.  Priority  rendering  cases. 


It  is  intriguing  that,  despite  the  significant  literature  on  hidden 
surface  removal,  almost  all  hardware  systems  use  the  z-buffer, 
what  Sutherland  aptly  called  “brute-force  image  space”  [26]. 
Furthermore,  most  proposals  for  future  hardware  systems 
employ  either  the  z-buffer  or  ray  tracing  [12].  What’s  hap¬ 
pened  to  analydc  and  optimal  hidden  surface  removal?  Is  die 
iiiddcn  surface  problem  simply  atj  academic  exercise?  We 
don’t  think  so.  We  believe  that  there  are  simple  reasons  w'hy 
analytic  hidden  surface  metliods  arc  not  common  in  hardware. 
Three  of  these  reasons  are: 

(1)  Numerical  problems 

(2)  Absence  of  optimized  programmable  hardware 

(3)  Asymptotic  efficiency  is  not  linear. 


Let’s  consider  numerical  issues  first  All  efficient  hidden 
surface  algorithms  make  extensive  use  of  calculated  edge 
intersections.  Unfortunately,  the  edge  (or  line)  intersection 
calculation  is  not  numerically  stable.  This  cdculation  re¬ 
quires  the  quotient  of  two  cross  products  of  endpoint  co¬ 
ordinates.  The  problem  is  that  the  denominator  of  this 
quotient  approaches  zero  as  the  two  line  segments  being  tested 
for  intersection  approach  parallel.  Near  zero,  round-off  errors 
in  computing  the  denominator  can  result  in  a  completely 
erroneous  answer  -  a  misplaced  intersection  location  or  even 
no  intersection  where  there  is  one  and  vice-versa.  The 
solution  to  this  problem  is  either  to  use  fixed  point  co¬ 
ordinates  with  extended  precision  or  use  accurate  dot  product 
calculations  for  floating  point  co-ordinates  [21]  [27]. 

The  second  issue  is  the  absence  of  optimized  hardware  archi¬ 
tectures  for  running  graphics  algorithms.  Historically,  the 
widespread  u-se  of  Z-buffers  was  coincident  with  the  advent  of 
VLSI  powerful  enough  to  imbed  the  z-buffer  algorithm  in 
silicon.  Software  Z-buffers  were  too  slow.  Is  this  the  general 
purpose  graphics  processor?  In  comparison,  digital  signal 
processing  (DSP)  has  its  genre  of  architectures  and  chips. 
DSP  chips  are  optimized  for  FFTs  and  filtering  (fast  address 
calculations  and  multiply-add  instructions).  Also,  today’s 
workstations  have  architectures  optimized  for  performance 
with  compilers  (and  Unix),  i.e.,  RISC.  There  arc  no  corre¬ 
sponding  programmable  architectures  optimized  for  geomet¬ 
ric  calculations.  In  pracdcc  manufacturers  use  DSP  chips  or 
RISC  chips  for  graphics.  We  think  an  optimized  graphics 
architecture  exists  and  offers  significant  performance  im¬ 
provement  over  using  existing  DSP  or  RISC  processors  as 
graphics  substitutes.  One  possible  architecture  is  described 
below.  While  it  has  much  in  common  with  DSPehipsorRISC 
chips,  the  difference  is  in  how  the  major  functional  blocks  arc 
organized  and  optimized. 

First,  we  observe  that  graphics  algorithms  have  two  compo¬ 
nents  -  a  topological  component  and  a  geomeuie  component. 
The  topological  component  deals  with  list-like  relationships 


Data  Memories 


Processors 


Fig.  19.  Topology-geometry  processor  (TCP). 


while  the  geometric  component  deals  wiih  calculations  on 
geometric  entities  such  as  points,  lines,  plane  equations,  etc. 
We  propose  handling  these  two  components  in  separate  pro¬ 
cessors  with  separate  data  memories.  We  call  this  arrange¬ 
ment  a  topology-geometry  processor  (TCP).  It  has  the  generic 
structure  shown  in  Fig.  19. 

The  topology  processor  is  optimized  for  manipulating  lists 
and  pointers.  For  example,  in  a  single  cycle  it  can  select  a  base 
address  (like  a  “C"  structure  address),  calculate  an  offset 
address  (an  item  within  that  structure),  and  read  or  write  the 
corresponding  memory  location.  Calculating  these  addresses 
in  a  RISC  processor  would  take  several  additional  cycles. 
Such  address  calculation  units  arc  more  common  in  DSP 
chips.  The  geometry  processor  is  a  floating-point  or  fixed 
point  arithmetic  unit.  It  receives  macro  comma^ids  from  the 
topology  processor  (c.g.,  "do  these  two  lines  intersect?").  By 
running  it  as  a  separate  processor  with  separate  datapaths, 
simultaneous  operation  of  both  processors  is  more  easily 
handled  than  on  chips  that  require  complex  interleaved  soft¬ 
ware  in  order  to  dispatch  integer  and  floating-point  operations 
in  the  same  cycle.  Clearly,  this  is  more  or  less  Ac  same  amount 
of  hardware  as  exists  in  current  RISC  processors,  it’s  Just 
organized  and  optimized  a  little  differently.  Lastly,  the  data 
and  instruction  memories  arc  small  enough  to  be  implemented 
in  SRAM,  eliminating  caches. 

We  have  performed  a  preliminary  assessment  of  one  possible 
implementation  of  Ais  architecture.  The  implemenAtion  uses 
two  TGPs  in  scries.  The  first  TOP  takes  Ac  world  daA  base, 
transforms  it  and  discards  polygons  which  arc  back-facing  or 
outside  Ac  field  of  view.  Its  output  is  a  topologically- 
connected  screen-space  daunbasc  of  potentially  visible  poly¬ 
gons.  The  second  TCP  has  two  topology  processors  and  two 
geometry  processors.  It  executes  a  scanline  hidden  surface 
algori  Am  and  decomposes  Ae  visible  pieces  of  polygons  into 
trapezoids  (as  m  Section  3.2).  This  scan-line  algorithm  uses 
Ac  plane-sweep  [20]  paradigm  at  the  highest  level,  with 
pointers  to  distinguish  internal  and  contour  edges  [24],  The 
algoriAm  processes  each  polygon  vertex  and  well  as  interscc- 
tions  between  contour  edges  and  intersections  between  visible 
contour  edges  and  visible  internal  edges.  Each  of  Aese  cases 
is  processed  in  30-50  machine  cycles.  Thus  a  25  MHz 
implementation  results  in  polygon  throughputs  of  300K- 
600K  polygons  per  second  for  all  but  the  most  pathological 
cases.  Extending  Ais  algoriAm  to  multiple  processors  for 
parallelism  (in  screen  space)  is  straightforward. 

AlAough  Ais  scanline  algorithm  is  adequate  for  real-time 
(30Hz-60Hz)  scenes  of  moderate  complexity  (5k-10K  poten¬ 
tially  visible  polygons  per  frame).  Acre  is  still  Ae  question  of 
how  effective  is  such  an  approach  for  more  complex  scenes. 
It  is  known  Aat  Aeasymptotic  performance  of  analytic  hidden 
surface  algoriAms  is  not  linear.  In  particular 

•  Intersections  can  be  0(n2) 

•  Sesning  is  0(n  log(n)) 

These  piroblcms  arc  topics  of  current  research.  One  obvious 
approach  is  some  degree  of  parallel  processing  for  hidden 
surface  removal  [9].  OAer  possible  approaches  arc 

•  Hybrid  algoriAms 

•  Content-addressable  and  associate  memories 


84 


Hybrid  algorithms  use  ^me  combination  of  priority  and  full 
hidden  sirfare  removal.  In  areas  with  many  intersections, 
priority  can  be  more  efficient  (trading  inter^tions  against 
depth  complexity).  The  non-linear  efficiency  of  sorting  can 
be  ameliorated  by  noting  that  the  n  log(n)  complexity  is  for 
traditional  algorithms  running  on  traditional  machines.  In 
contrast,  content  addressable  and  associative  memories  can 
sort  in  linear  time.  Very  efficient  CAM  cells  (in  terms  of 
rilicon  area)  have  been  reported  in  the  literature  and  the  use  of 
such  innovative  memory  architectures  holds  interesting  pos¬ 
sibilities. 


5  SUMMARY  AND  CONCLUSIONS 

We  have  described  a  straightforward  and  comprehensive 
implementation  of  prefiltering.  Our  design  leverages  the 
availability  of  large  (1  M-bit)  semiconductor  memory  to 
provide  an  efficient  system,  both  in  hardware  complexity  and 
speed.  The  high-resolution  one-eighth  pixel  grid  provides 
excellent  image  quality  and  smooth  motion  for  both  lines  and 
polygons.  The  antialiasing  filters  themselves  can  be  arbitrary 
symmetric  functions  over  a  2-by-2  pixel  domain  for  polygons 
and  a  3-by-3  domain  for  lines. 

This  antialiasing  approach  is  used  in  commercially  available 
hardware.  We  believe  it  encourages  the  development  of 
algorithms  and  special  hardware  for  priority  or  full  hidden 
surface  removal.  Such  systems  would  combine  real-time 
performance  with  image  quality  not  possible  with  z-buficr  or 
ray  tracing  architectures. 
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Abstract 

Direct  volume  rendering  is  a  computationally  intensive  opera¬ 
tion  that  has  become  a  valued  and  often  preferred  visualization  tool. 
For  maximal  data  comprehension,  interactive  manipulation  of  the 
rendering  parameters  is  desirable.  To  this  end,  a  reasonable  target 
would  be  a  .system  capable  of  displaying  128’  voxel  data  sets  at 
multiple  frames  per  second.  Although  the  computing  resources 
required  to  attain  this  performance  are  beyond  those  available  in 
current  uniprocessor  workstations,  multicomputers  and  VLSI  ren¬ 
dering  hardware  offer  a  solution.  This  paper  describes  a  volume 
rendering  algorithm  for  MIMD  message  passing  multicomputers. 
This  algorithm  addresses  the  issues  of  distributed  rendering,  data  set 
distribution,  load  balancing,  and  contention  for  the  routing  network. 
An  implementation  on  a  multicomputer  with  a  ID  ring  network  is 
analyzed,  and  extension  of  the  algorithm  to  a  2D  mesh  topology  is 
described.  In  addition,  the  paper  presents  a  method  of  exploiting 
screen  coherence  through  the  use  of  VLSI  pixel  processor  arrays. 
Though  not  critical  to  the  general  algorithm,  this  rendering  approach 
is  demonstrated  in  the  example  implementation  where  it  serves  as  a 
hardware  accelerator  of  the  rendering  process.  Commercial  graphics 
workstations  use  pixel  processors  to  accelerate  polygon  rendering: 
this  paper  proposes  a  new  use  of  this  hardware  for  accelerating 
volume  tendering. 

1.  Introduction 

Direct  volume  rendering  is  the  common  name  that  describes  the 
viewing  of  volume  data  as  a  semi-l.'ansparent  cloudy  material.  Its 
advantages  are  that  much  or  all  of  the  volume  may  be  visible  to  the 
observer  at  one  time;  there  is  no  need  to  introduce  intermediate 
geometry  that  doesn't  really  exist  in  the  data.  We  assume  the  input 
data  is  a  scalar  field  sampled  at  the  vertices  of  a  3D  rectilinear  lattice 
•  a  situation  often  encountered  in  medical  and  simulation  data.  Plate 
1  is  an  image  of  a  representative  medical  data  set  of  dimensions 
128x128x124.  The  following  3-step  conceptual  model  of  the  vol¬ 
ume  rendering  process  is  based  on  previously  published  derivations 
[Blinn821  (Kajiya’84].  Much  of  this  exampk  comes  from  Wilhelms 
and  Gelder  [Wilhelms*91]. 
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1  -  Reconstruct  the  continuous  3D  scalar  function  F  by  con¬ 
volving  each  sample  point  f  with  a  reeonstniction  filter  kernel 
K. 

F(x.y.z)  =  I  f  *K 

x,y,2 

2  -  Apply  an  opaeity  0  and  shading  S  function  to  the  continu¬ 
ous  scalar  field.  These  user  definable  transfer  functions  yield  a 
differential  opacity  ft  =  0(F)  and  color  emittance  E  =  S(F)  at 
each  point  in  the  volume  as  a  function  of  the  sealar  field 
properties  at  that  point.  The  ft  and  E  fields  should  then  be  low- 
pass  filtered  for  resampling  in  the  next  step. 

3  •  Integrate  an  intensity  and  transparency  function  along 
sample  view-ray  paths  through  the  volume.  The  integrals  may 
be  taken  toward  or  away  from  the  viewer.  When  taken  towards 
the  viewer,  the  accumulated  intensity  I  and  transparency  T 
along  the  sample  ray  is 

I(p)  =  T(p)|  where  T(p)  =  e'{,^^''^‘^'' 

The  intensity  equation  has  an  analytic  solution  if  we  assume  T 
and  E  constant  over  the  interval  (0,p].  By  applying  this 
constraint  over  limited  size  intervals,  we  may  approximate  the 
intensity  and  transparency  of  any  interval.  Successive  intervals 
are  composited  to  obtain  the  cumulative  intensity  or  color 
reaching  the  viewer  along  the  ray. 

Tliere  are  four  common  algorithmic  approaches  to  approximating  the 
above  three  step  process  in  actual  implementations. 

1.1.  Ray-casting  -  The  volume  is  resampled  along  view  rays 
(Levoy88)  |Sabella88]  (Upson*88).  The  ft  and  E  functions  must  be 
reconstructed  at  the  new  sample  points  along  the  rays.  Typically,  3D 
reconstruction  is  done  by  Irilinear  interpolation  of  the  ft  and  E 
function  values  evaluated  at  the  lattice  vertices.  Successive  samples 
along  a  ray  are  composited  to  produce  the  final  ray  color. 

1.2.  Serial  Transformations  An  affine  vievc  transformation  is 
decomposed  into  three  sequential  3D  shear  otK-tiitions.  Each  shear 
isaffecled  by  a  1 D  transfonn.ition  of  the  form  x’  =  Ax  +  B  lDrebin*881 
[HaiirahauiDO].  Since  these  translonnatiuns  require  only  a  ID 
reconstruction  filter,  cubic  splines  arc  commonly  used  to  facilitate 
the  resampling.  Tlie  resampled  volume  is  screen-aligned  and  ready 
for  integration  and  compositing. 
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1.3.  Splatting  -  This  approach  computes  the  effect  of  each  voxel  on 
the  pixels  near  the  point  to  which  it  projects.  Slices  of  voxels  are 
sorted  by  depth  order  and  reconstructed  by  convolution  with  a  2D 
filter  kernel.  The  reconstructed  function  is  resampled  and  accumu¬ 
lated  on  a  screen-aligned  grid.  Successive  slices  are  composited  to 
produce  the  final  image.  Since  the  filter  is  position-invarient  for 
affine  transformations,  software  table  methods  are  often  used  to 
quickly  approximate  it  [Westover891. 

1.4,  Cell  Projection  -  Volumes  are  decomposed  into  polyhedra 
whose  vertices  are  derived  from  the  sampled  data  lattice  [Shirley *90J 
[Max*90J  Wilhelms*91].  The  polyhedra  are  converted  to  polygons 
by  projecting  them  under  the  view  transformation.  Reconstruction 
is  done  to  obtain  each  polygon’s  vertex  values  for  the  opacity  and 
emission  functions.  The  resulting  polygons  are  rendered  by  conven¬ 
tional  means  using  a  painter’s  algorithm  and  alpha  compositing. 
These  methods  make  effective  use  of  existing  polygon  rendering 
hardware.  The  reconstruction  functions  are  usually  linear  since  most 
rendering  hardware  does  linear  interpolation  of  the  polygon  vertex 
values. 

2.  Rendering  Hardware 

The  rendering  method  proposed  here  is  a  parallelized  splatting 
approach.  Using  multiple  processors  with  parallel  frame  buffer 
access,  a  splat  keniel  is  produced  and  merged  into  an  image  at  many 
pixels  simultaneously.  Current  graphics  workstations  [SGI]  use 
multiple  processors  to  allow  parallel  access  of  pixel  values  in  a  frame 
buffer,  Such  groups  of  processors  with  parallel  frame  buffer  access 
are  what  we  refer  to  with  the  tenii  pixel  processors. 

This  idea  of  using  pixel  processors  to  accelerate  volume  tendering  is 
not  totally  new.  Cell  projection  methods  were  created  to  make  use 
of  it.  Laurand  Hanrahan  approximate  splat  filter  kernels  with  groups 
of  polygons  rendered  by  dedicated  hardware  (Laur’91].  The  new 
aspect  of  the  method  proposed  here  is  that  of  coercing  the  hardware 
to  render  a  splat  filter  kernel  directly  as  a  single  graphic  primitive. 
Tlie  next  sections  describe  two  methods  whereby  pixel  processor 
arrays  create  splat  kernels  for  convolution  with  voxels. 


2.1,  Textured  Kernel  -  The  kernel  primitive  can  be  thought  of  as 
a  polygon  with  a  nonlinear  interpolation  function.  Arbitrary  interpo¬ 
lation  functions  may  be  defined  as  textures.  Given  an  array  of  pixel 
processors  capable  of  texture  lookup  and  multiplication,  the  screen 
coherence  of  each  splat  can  be  exploited.  Current  generation 
graphics  workstations  have  this  capability ,  although  they  may  not  yet 
offer  the  firmware  needed  to  exploit  it  (SGI),  The  splat-polygon's 
color  and  opacity  are  those  of  the  voxel  it  represents.  The  splat- 
polygon  with  its  texture  coordinates  is  transformed  and  rendered 
normally  except  for  the  additional  processing  required  by  the  pixel 
processors  (Fig.  1 ).  The  pixel  processors  compute  the  texture  value 
based  on  the  texture  coordinates  at  each  pixel.  This  texture  is  the 
kernel  function  which  is  used  to  weight  the  polygon  color  and 
opacity.  The  convolution  results  at  each  pixel  are  accumulated  in  a 
slice  buffer  [Westover89].  When  a  complete  slice  of  voxels  is 
splatted,  the  pixel  processors  composite  the  slice  buffer  into  the 
image. 

In  lieu  of  texture  lookup  capability,  a  kernel  may  be  computed.  This 
latter  approach  is  most  appropriate  for  Pixel-Planes  5,  the  target 
machine  forthe  implementation  described  here.  Before  detailing  the 
kernel  computation  algorithm,  the  next  section  briefly  describes 
some  essential  aspects  of  this  machine. 

2.2.  Pixcl-Plane.s  5  Overview  •  This  machine  has  multiple  i860- 
based  Graphics  Processors  (GPs),  and  multiple  SIMD  pixel  proces¬ 
sor  arrays  called  Renderers  (Fig.  2).  Each  Renderer  is  a  128x128 
array  of  pixel  processors  capable  of  executing  a  general  purpose 
instruction  .set.  GPs  send  Renderers  opcode  streams  which  arc 
executed  in  SIMD  fashion.  Renderers  also  have  a  Quadratic  Expres* 
sion  Evaluator  (QEE)  that  may  be  configured  to  occupy  any  screen 
position  [Fuchs*891.  Special  QEE  opcodes  evaluate  the  function 

Q  =  Ax  -H  By  +  C  +  Dx*  +  Exy  +  Fy- 

at  each  processor  in  the  Renderer  for  its  unique  x,y  location.  Config¬ 
uring  a  Renderer  to  a  new  screen  position  is  accomplished  by 
offsetting  the  QEE  so  that  each  pixel  processor's  QEE  result  is  based 
on  its  offset  x,y  location.  Tlie  coefficients  A  -  F  are  part  of  the 
instruction  stream  from  the  GPs.  Renderers  also  have  pons  that  allow 
dara  movement  in  and  out  of  the  processor  array  under  GP  control. 
The  GPs,  Renderers,  a  Frame  Buffer,  and  workstation  host  all 
communicate  over  an  eight-channel  1 D  ring-network  whose  aggre¬ 
gate  bandwidth  is  160  M words  per  second. 


Fig.1.  Splatting  w>th  hardware  for  textured  polygons 


Fig.  2.  Pixel-Planes  5  system  cor-'ponents 
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:23;;C6inputed  Kernel  and  Slice  Splatting  -  Polynomials  are  rea¬ 
sonable  approximations  to  a  gaussi^  filter  kernel  and  easily  com¬ 
puted.  The  function 

2t^-3^+i  where  0<r<l 

is  a  low  order  gaussian  approximation,  but  is  somewhat  awkward  to 
compute  directly.  A  quartic 

Q2  =  (r-t^)2 

is  a  more  practical  solution  since  a  quadratic  term  Q = ( 1  -r*)  may  be 
computed  directly  by  the  QEE  and  later  squared  at  all  pixels  in 
parallel;  'Hie  kernel  value  ((y)  at  each  pixel  scales  a  voxel's  color  and 
-opacity.  Volumes  are  splatted  aslice  at  a  time  by  summing  the  scaled 
color  and  opacity  values  into  a  slice  buffer  and  then  compositing  the 
slice  buffer  into  the  accumulated  image.  The  squaring  and  scaling 
operations  are  expensive  and  should  be  factored  out  of  the  per-voxel 
inner  loop.  The  kernels  of  adjacent  voxels,  however,  typically 
overlap  so  we  must  square  and  scale  more  than  once  per  data  slice. 
By  limiting  the  kernel  radius  to  two  inter-voxel  distances,  every 
fourth  voxel  in  x  and  y  will  affect  a  disjoint  set  of  pixels  (Fig,  4). 
Therefore,  in  one  pass  we  can  splat  one-sixteenth  of  the  voxels  in  a 
slice  before  squaring,  scaling,  and  accumulation  into  the  slice  buffer 
must  be  done.  (A  kernel  radius  of  about  1 .6  seems  to  yield  the  best 
overall  image.)  Pseudocode  to  implement  the  slice  splatting  process 
on  Pixel-Planes  5  is  given  in  figure  3. 

for  (xs  a  0;  xs  <  4:  xs++)  {  /*  cycle  all  1 6  passes  */ 

for(ys  =  0;ys<4:  ys++)  1 
for  (x  =  xs;  X  <  slice.xsize;  x  +=  4)  ( 
for  (y  =  ya;  y  <  slice_ysize;  y  +=  4)  ( 

/♦  take  every  fourth  voxel  in  x  and  y  ♦/ 

GP  computes  QEE  coefficients  needed  to  produce  Q  at 
position  S  in  the  Renderer  array; 

CP  sends  o^ode  and  coefficients  to  Renderer  which 
computes  Q  values  in  the  pixel  Muy, 

Renderer  enables  pixels  with  Q  >  0; 

/*  only  enabled  pixels  participate  in  the  next  instruction  */ 
Renderer  pixels  save  Q  and  load  voxel  color  and 
opacity  sent  by  GP; 

1  1  /*  end  of  pass  ♦/ 

GP  instructs  Renderer  to  square  the  saved  Q  values  at 

ALL  pixels;  /♦  one  mult  *1 

GP  instructs  Renderer  to  scale  the  color  and  opacity 

by  Q2;  /*  two  mults  */ 

GP  instructs  Renderer  to  accumulate  scaled  color  and 

opacity  into  slice  buffer.  /*  two  adds  ♦/ 

I  I  I*  end  of  slice  */ 

GP  instructs  Renderer  to  composite  slice  buffer  into  image; 

Fig.3.  Pseudocode  for  splatting  one  slice  using 
Pixel-Planes  5  Renderers 


When  all  the  voxels  in  slice  i  are  splatted,  the  accumulated  slice  color 
and  opacity  at  each  pixel  are  composited  behind  the  current  image  of 
i-1  slices  to  produceanimageofislices.  The  compositing  operation 
is  efficiently  done  in  parallel  for  each  pixel  of  the  array. 

Alpha!  :=  Alpha!- 1  +  Alphaslice  *  (1  -  Alpha!- 1) 

Color}  :=  Colori-l  +  Colorslice  *  (1  -  Alpha!- 1) 

For  arbitrary  rotations,  different  slice  orientations  are  used  and  the 
kernels  are  made  elliptical  to  preserve  the  independence  of  pixels 
during  each  pass  (Fig.  4).  For  affine  projections,  the  elliptical  shape 
is  constant  for  all  voxels  making  the  D,  E,  and  F  quadratic  coeffi¬ 
cients  constant  over  the  whole  frame.  In  this  case,  a  linear  expression 
evaluator  (LEE)  is  all  that  is  needed  on  the  pixel  processor  since  the 
D,  E,  and  F  terms  may  be  computed  once  per  frame  at  each  pixel  and 
added  to  each  data  point's  linear  term.  This,  however,  exacts  a  small 


Fig.4.  Elliptical  kernel  extents  for  one  pass  showing 
independence  of  every  fourth  voxel  in  x  and  y 


performance  penalty,  so  in  this  implementation  the  available  QEE 
was  used. 

The  elliptical  kernel  coefficients  are  computed  from  the  scaling  and 
rotation  portions  of  the  view  transformation  V.  We  first  scale  V  to 
account  for  the  kernel  radius  T,  specified  as  the  number  of  inter-voxel 
distances. 

[M]=  T[V] 

Now  the  pixel  coordinates  <x,  y,  z>  must  be  transfomied  back  to  the 
data  space  coordinates  <i,  j,  k>  where  the  computed  kernels  arc 
always  radially  symmetric,  of  unit  radius,  and  therefore,  the  kernels 
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of  each  pass  are  non-overlapping. 

a  =  .MiL  and  b  =  -M32 

M33  M33 

Let  k  =  0  always.  Solving  for  z  produces  z  =  ax  +  by  where 
Now  we  can  express  data  coordinates  <i,  j>  in  terms  of  pixel 

where  [p]  =f  *^1 1  +  a M13  M2 1  +  a M23  1 

[Mi2+bMi3  M22  +  bM23j 

coordinate  <x,  y>. 

The  kernel  function  is  Q-  =  (1  -  r-)^  but  we  render 

Q  =  (l-r^)=l-[(i-i„)U0-j„)=l 

where  i^  and  j„  define  the  center  of  the  kernel.  Using  P,  we  transform 
Q  into  a  function  of  pixel  coordinates  <Ji,  y.>,  and  after  some  algebra 
we  obtain  the  coefficients  that  allow  the  QEE  to  directly  evaluate  Q 
in  the  Renderer. 
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GPO 


A-=.2[XoiPll+P2l)  +  yo(Pl!Pl2+P2lP22)i 

B  =  2[y6  (P|2+pi2)  +  Xo  (PllPl2+P2lP22)] 

G  =  1  -  Xo  (Pii+PzO  -  yo  (P12+P22) 
p.=-p?l^pi, 

E  =  -2(PllPl2+P2lP22) 

'F=-P|2-Pi2 

Note  that  D,  E,  and  F  do  not  depend  on  the  kernel  position  <Xj,  y^>. 
Given  any  kernel  position  and  allowing  for  precomputation  of  the 
view  dependent  terms,  computing  A  or  B  requires  only  two  multipli¬ 
cations  and  one  addition.  Computing  the  C  coefficient  requires  two 
multiplications  and  two  additions. 

3.  Multicomputer  Rendering  Aigorithm 

Parallel  volume  rendering  algorithms  must  cope  with  poten¬ 
tially  moving  massive  amounts  of  data  every  frame.  For  this  reason, 
optimizing  the  distribution  of  data  among  the  memory  spaces  in  the 
machine  is  important.  Full  data  replication  is  a  trivial  option  deemed 
too  expensive  for  most  cases.  Partial  replication  is  often  necessary 
or- desirable.  Data  subsets  may  be  slafis  or  blocks,  packed  or 
interleaved,  and  static  or  dynamic.  The  proposed  algorithm  makes 
use  of  a  static,  interleaved,  slab  distribution.  It  is  static  because  each 
voxel  is  assigned  a  home  node  (or  nodes)  where  it  remains.  It  is 
interleaved  since  each  node  has  several  subvolumes  of  data,  slices  to 
be  exact,  that  are  not  adjacent  to  each  other.  Slices  are  identified  as 
slabs  since  they  extend  to  the  volume  boundaries  in  two  dimensions. 
This  distribution  is  simply  achieved  by  assigning  slices  to  nodes  in 
a  round-robin  fashion.  It  was  chosen  for  load  balancing  and  memory 
limitation  reasons.  If  packed  slices  (a  single  slab)  were  used,  there 
exists  a  strong  possibility  of  the  outer  slice  sets  having  less  non-zero 
data  to  render  than  the  inner  slice  sets.  By  interleaving  slices,  the 
spatial  distribution  of  data  at  each  node  is  similar. 

The  memory  space  issue  arises  from  the  need  to  buffer  an  entire 
slice's  Renderer  instruction  stream  as  well  as  store  three  sets  of  slice 
data.  This  distribution  stores  three  copies  of  the  data  set  since  we 
need  slice  sets  oriented  perpendicularly  to  each  of  the  data  axes  i,  J, 
and  k.  The  set  of  slices  most  parallel  to  the  view  plane  are  used  when 
traversing  the  data  set. 

The  proposed  algorithm  attempts  to  maximize  the  utilization  of 
Renderers  and  GPs  without  requiring  an  executive  processor.  Al¬ 
though  this  implementation  uses  special  hardware  Renderer  nodes, 
rendering  could  be  performed  by  general  purpose  processors.  In 
fact,  the  latterwouldofferfreedominallocatingGPorRenderertasks 
as  necessary  to  achieve  optimal  performance.  The  algorithm  makes 
use  of  image  parallelism  by  assigning  each  Renderer  to  a  unique 
128x128  pixel  screen  region.  Renderers  receive  splatting  instruc¬ 
tions  only  for  voxels  that  project  to  their  region.  Voxels  near  region 
boundaries  are  splatted  at  two  or  four  Renderers  to  eliminate  seams 
in  the  image.  Since  compositing  must  proceed  in  front-to-back  or 
back-to-front  order,  Renderers  must  receive  slices  in  sequence. 

Figure  5  illustrates  three  GPs  and  two  Renderers  computing  a  frame 
of  a  six  >lice  data  set.  At  the  start  of  a  frame,  each  GP  shades  and 
transforms  theirfront-most  slice.  Phong  shading  is  accomplished  via 
a  lookup  table  indexed  by  the  voxel’s  gradient  vector.  An  affine 
transformation  is  performed  by  DDA  methods  requiring  only  three 
adds  per  point  after  setup.  Renderer  instructions  for  splatting  the 
shaded  and  transformed  voxels  are  sorted  by  screen  regions  and 
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placed  into  separate  buffers.  A  token  for  each  Renderer  is  circulated 
among  the  GPs  to  indicate  pemiission  to  send  splat  instructions  to 
that  Renderer.  Initially,  all  renderer  tokens  originate  at  the  GP  with 
the  front-most  slice.  The  TO  and  T1  arcs  in  figure  5  represent  the 
tokens  for  RendcreiO  and  Rendererl  respectively.  Upon  receipt  of  a 
token,  a  GP  transmits  the  splat  instructions  for  that  Renderer's  region. 
The  token  is  then  passed  to  the  GP  with  the  next  slice.  The  circulating 
tokens  ensure  that  Renderers  receive  slices  in  front-to-back  se¬ 
quence.  The  tokens  also  allow  multiple  GPs  to  simultaneously 
transmit  instructions  to  different  Renderers.  When  all  the  tokens 
have  passed  through  a  GP,  it  computes  the  Renderer  instmetions  for 
its  next  slice.  The  GP  with  the  last  slice  is  responsible  for  instructing 
the  Renderers  to  transmit  their  final  color  values  to  the  frame  buffer. 

3.1.  Mesh  Topology  Extension  -  Since  large  mesh  topology 
machines  are  being  built  and  commercially  offered,  it  is  of  interest  to 
note  that  this  general  approach  docs  extend  to  them.  Extension  to 
square  N  x  N  mesh  topologies  requires  that  at  least  one  edge  of  the 
mesh  be  connected  to  N  Renderers.  The  data  slices  are  assigned  to 
mesh  nodes  sequentially  row  by  row  (Fig.  6).  Tokens  (N  of  then  )  are 
circulated  through  the  nodes  as  before.  To  avoid  conter  ion,  we 
specify  that  manhattan-style  routing  is  perfomied  for  all  Renderer 
messages;  messages  travel  as  far  as  possible  in  the  direction  sent,  and 
then,  if  needed,  with  one  turn  they  head  to  their  destination.  In  effect, 
we  utilize  the  mesh  as  a  cross-bar  interconnect.  With  some  inspec¬ 
tion  It  should  become  apparent  that,  if  tokens  move  in-step,  Renderer 
splat  instructions  will  never  compete  for  a  communication  link; 
routing  hardware  will  always  be  able  to  forward  messages. 

It  should  be  noted  that  although  the  number  of  nodes  increases  as 
OthFj,  the  number  of  Renderers  increases  only  as  OfN).  A  scheme 
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Fig.6.  Data  distribution,  Renderer  region  assignment, 
and  message  paths  for  a  mesh  topotogy 

that  accommodates  a  limited  N  range  is  to  place  Renderers  along  both 
sides  of  the  array  and  allow  2N  tokens  to  circulate.  In  a  vertically 
wrapped  liiesh.  all  2N  Renderers  can  receive  instructions  simultane¬ 
ously  without  incurring  contention  for  the  network.  Figure  6  shows 
sixteen  nodes  and  eight  renderers  with  eight  tokens  in  circulation. 
Nodes  with  arrow  arcs  leaving  them  (nodes  2  •  9)  have  tokens  and  are 
transmitting  Renderer  instructions.  Utilized  paths  are  marked  with 
the  message's  destination  Renderer  number. 

A  practical  issue  inhibiting  this  and  other  implementations  of  this 
sort,  is  the  general  diltlcul  ty  of  constructing  distributed  frame  buffers 
and  their  consequent  commercial  unavailability. 

4.  Performance 

To  understand  the  behavior  of  this  system,  we  first  analyze  the 
performance  of  each  system  element.  Then  we  look  at  the  load 
balance  between  the  elements  and  how  that  affects  the  overall  system 
perfomiance. 

4.1.  Renderer  Performance  -  The  Renderers  digest  six  instruction 
words  per  splat  point  and  compute  Q  to  ten  bit  precision  in  77  cycles 
of  a  40  MHz  clock.  The  squaring  and  scaling  operations  use  about 
900  cycles  per  pass.  Compositing  at  the  end  of  each  slice  requires 
about  700  cycles.  For  a  128’  data  set.  the  pass  and  slice  overhead 
totals  1,932,800  cycles  for  each  Renderer,  or  about  50  ms.  Based  on 
these  cycle  counts,  splatting  128’  voxels  on  one  Renderer  should  take 
4.09  seconds  including  the  SO  ms  overhead.  Experimental  data 
correlates  well  with  this  predicted  Renderer  perfomiance.  Using 
twenty  GPs  and  one  Renderer,  a  128’  cube  of  voxels  is  splattcd  in 
4.38  seconds.  This  corresponds  to  a  Renderer  throughput  of  about 
478,000 voxels  per  second.  Use  of  multiple  Renderers  distributes  the 
voxel  load  while  increasing  only  the  GP  token  passing  overhead. 
With  four  Renderers,  the  same  128’  voxels  are  splatted  in  1.29 
seconds,  or  equivalently,  a  combined  Renderer  throughput  of  1 .622 
Mvoxels  per  second. 

4.2.  GP  Performance  The  GP  cost  of  splatting  a  voxel  is  the  view 
transformation  followed  by  four  additions  and  six  multiplies  to 
compute  the  QEE  coefficients.  Tlie  transformed  voxel  must  also  be 
sorted  by  screen  region  so  that  the  six  instruction  words  to  splat  it  are 
placed  in  the  proper  Renderer's  buffer.  For  a  128’  slice,  a  GP 
processes  16,384  voxels  into  98,304  buffered  instmetion  words  in 
0.1 6  seconds,  achiev  mg  d  computation  throughput  of  about  102,000 
voxels  per  second.  As  tokens  arrive,  the  buffered  instructions  are 
transmitted  to  the  Renderers  Pixel  Planes  5  GPs  use  specially 
addre.ssed  read  cycles  to  mov  e  data  to  the  ring  This  scheme  achiev  es 
>30  Mword  per  second  peak  throughput  into  the  transmit  FIFO.  The 
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ring  requires  about  S  milliseconds  to  transmit  this  data.  Message 
software  overhead  adds  roughly  4  milliseconds  for  the  192  message 
packets  transmitted,  hence  a  GP  is  able  to  transmit  a  slice's  buffered 
Renderer  instruelions  in  under  10  milliseconds  assuming  no  ring  or 
Renderer  contention. 

In  most  volume  data  sets,  many  voxels  arc  transparent  and  therefore 
no  Renderer  splat  inslmctions  are  generated  for  them.  Passes  or 
slices  that  contain  only  transparent  voxels  produce  no  Renderer 
instructions  for  splatting  or  overhead  operations.  Figure  7  shows 
performance  .statistics  for  the  128x128x124  data  set  shown  in  Plate 
2.  About  32%  of  the  voxels  (664,486)  are  non-transparent  and 
actually  rendered.  The  image  size  is  determined  by  the  number  of 
Renderers.  Four  Renderers  produce  a  256x256  image  while  nine  and 
sixteen  Renderers  produce  384x384  and  512x512  images  respec¬ 
tively.  It  is  unusual,  but  with  this  sort  of  hardware  larger  images 
render  faster  because  of  the  increased  Renderer  parallelism. 

4.3.  Load  Balance  -  In  heterogeneous  systems  load  balancing  is 
difficult.  Computing  resources  are  not  interchangeable  and  therefore 
can  not  be  shifted  (without  swapping  boards)  us  needed  to  the  task 
most  burdening  the  system.  In  this  implementation,  either  Renderers 
or  GPs  can  limit  system  performance. 

Figure  7  illustrates  the  case  where  perfonnance  is  limited  by  the 
number  of  Renderers.  Tliis  is  often  the  case  if  there  are  many  non¬ 
transparent  voxels  to  be  splatted.  Adding  more  than  twenty  GPs  has 
minimal  effect  unless  the  number  of  Renderers  is  increased  above 
nine,  in  the  case  of  thirty  GPs  and  four  Renderers,  GPs  arc  waiting 
over  0.5  seconds  total  for  their  first  Renderer  tokens  after  they  have 
finished  processing  slices.  Figure  8  illustrates  a  Renderer-bound 
frame  with  three  GPs,  two  Renderers,  and  a  six  slice  data  set.  The 
shaded  areas  are  wasted  GP  waiting  time.  Circulating  tokens  are 
shown  as  ares  and  marked  TO  and  T1  for  their  respective  Renderers. 

When  a  very  high  percentage  of  voxels  are  transparent,  the  system 
behavior  changes.  Tliis  occurs  in  the  case  of  isosurface  extraction. 
Figure  9  shows  performance  statistics  for  a  128x128x128  data  set 
where  about  1 1%  of  the  voxels  (238,637)  are  non-transparent  and 
actually  rendered.  Here,  the  GP  slice  traversal  time  dominates  the 
system  performance.  With  so  few  voxels  actually  getting  splatted, 
using  more  than  nine  Renderers  has  no  appreciable  benefit.  Figure 
5  illustrates  a  GP-bound  frame  with  three  GPs.  two  Renderers,  and 
a  SIX  slice  data  set.  Tlie  shaded  arcs  represent  idle  Renderer  time. 
Tokens  hav  e  been  sent  to  GPO  w  here  they  must  w  ait  for  the  trav  ersal 
ot  the  next  slice  to  complete  before  Renderer  instructions  can  be  sent. 
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Fig, 8.  Ronderer-Bound  Frame 


Finally,  it  should  be  observed  that  the  limiting  resource  utilization 
does  not  approach  100%  in  either  the  GP-bound  or  Renderer-bound 
test  case.  Utilization  peaks  only  in  the  unlikely  situation  where  each 
GFs  slices  are  an  identical  workload  and  the  Renderers  are  each  hit 
by  the  same  number  of  voxels  every  slice.  Since  the  system 
resynchronizes  every  slice  due  to  the  token  passing,  each  slice  has  it's 
own  load  balance.  The  overall  behavior  of  the  system  could  resemble 
GP-bound,  yet  a  number  of  slices  in  a  frame  may  actually  be 
Renderer-bound,  thereby  lowering  the  GP  utilization.  In  most  cases 
then,  neither  the  GPs  nor  Renderers  are  fully  utilized.  This  is 
unfortunately  symptomatic  ormany  parallel  algorithms  -  computing 
resource  utilization  often  decreases  as  parallelism  increases 

4.4.  Progressive  Refinement  The  interactive  response  of  a  system 
can  often  be  increased  at  the  expense  of  image  quality.  Usually  this 
is  done  by  undersampling  some  vs  here  during  the  rendering  process. 
Using  the  pixel  processor  appto.ich,  there  is  no  .t  Jvantage  to 
undersampling  in  screen  space  and  then  interpolating  the  remaining 
pixels,  in  fact,  as  pointed  out  before ,  frame  rate  often  increases  as  the 
number  of  Renderers,  and  therefore  screen  pixels,  increases.  Instead 
we  may  undersarnple  the  volume  itself.  Tire  undersampling  may  be 
adaptive  (Laur*9 1  ]  or  a  regular  skipping  of  some  fraction  of  voxels. 
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Fig.9.  GP-Bound  Performance 


The  latter  is  simple  to  implement  by  rendering  every  other  voxel  in 
all  directions  of  the  data  set.  A  128’  data  .set  is,  for  example, 
effectively  rendered  us  64’.  While  the  speed  up  varies  due  to  load 
balance  issues  and  relative  frame  overhead,  this  technique  usually 
yields  at  least  a  factor  of  Five.  As  an  example,  consider  the  small 
system  of  ten  GPs  and  four  Renderers  that  produces  the  image  in 
Plate  2  at  0.93  Hz.  Undcrsampling  the  volume  us  a  64x64x62  data 
set  produces  a  sightly  blurred  image  at  a  rate  of  6. 15  Hz;  a  speed  up 
of  over  660%.  Undersampled  image  quality  can  be  improved  by 
rendering  a  separate  prenitcred  data  .set  instead  of  simply  skipping 
voxels. 

5.  Summary  and  Di.scu.<ist'on 

This  paper  presented  a  distributed  algorithm  for  volume  render¬ 
ing  on  multicomputers  along  with  two  methods  for  using  a  pixel 
processor  array  to  accelerate  splatting.  The  algorithm  was  imple¬ 
mented  on  a  ID  ring  topology  and  its  extension  to  a  2D  mesh 
topology  was  outlined.  The  splat  acceleration  technique  was  demon¬ 
strated  on  a  processor  array  with  QBE  capability.  An  alternative 
approach  using  texture  table  lookup  was  proposed  for  other  pixel 
processor  array  architectures. 

Tliis  implementation  is  not  presented  as  the  fastest  or  best  way  of 
doing  volume  rendering,  but  as  a  promising  alternative  approach 
whose  merits  arc  system  and  application  dependent.  The  algorithm 
was  implemented  on  Pixel-Planes  S  since  that  machine  was  available 
and  was  the  inspiration  of  the  pixel  processor  rendenng  idea  to  begin 
with.  This  system  has  no  less  than  five  different  parallel  volume 
rendenng  approaches  implemented  on  it  at  this  lime.  It  is  a  credit  to 
its  designers  that  this  is  so,  since  volume  rendenng  was  never  an 
explicit  design  consideration.  Hie  algorithm  is  implemented  m  C, 
some  of  the  low  level  communications  library  routines  were  crafted 
in  i860  assembly  code. 

Many  issues  remain  for  consideration  in  future  work.  Tlie  errors 
produced  by  view  dependent  filter  kernels  need  further  analysis. 
Fastei  pixel  processor  arrays  are  desirable,  perhaps  with  nibble  or 
byte  wide  data  paths.  Tlie  textured  filter  kernel  method  should  be 
explored  on  a  suitable  machine.  Other  parallel  algonthms  that  cope 
w  iih  load  imbalance  and  offer  adapii  v  e  processing  sav  ings  should  be 
investigated.  Tlic  algorilhmic  impact  of  different  data  distributions 
should  also  be  studied  particularly  dynamic  distributions  in  which 
data  migrates  amcing  the  GPs  as  the  view  changes  lNeumann911. 
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Abstract 

We  describe  a  software  system  on  the  Pixel-Planes  5  graphics 
engine  that  displays  user-deHned  antialiased  procedural 
textures  at  rates  of  about  30  frames  per  second  for  use  in  real¬ 
time  graphics  applications.  Our  system  allows  a  user  to  create 
textures  that  can  modulate  both  diffuse  and  specular  color,  the 
sharpness  of  specular  highlights,  the  amount  of  transparency 
and  the  surface  normals  of  an  object.  We  describe  a  texture 
editor  that  allows  a  user  to  interactively  create  and  edit 
procedural  textures.  Antialiasing  is  essential  for  real-time 
textures,  and  in  this  paper  we  present  some  techniques  for 
antialiasing  procedural  textures.  Another  direction  we  are 
exploring  is  the  use  of  dynamic  textures,  which  are  functions 
of  time  or  orientation.  Examples  of  textures  we  have 
generated  include  a  translucent  fire  texture  that  waves  and 
flickers  and  an  animated  water  texture  that  shows  the  use  of 
both  environment  mapping  and  normal  perturbation  (bump 
mapping). 

Introduction 

The  current  trend  in  graphics  libraries  is  to  give  users 
complete  control  of  an  object’s  surface  properties  by 
providing  a  language  specifically  for  shading  (Hanrahan  & 
Lawson  90).  There  are  two  lines  of  research  that  have  come 
together  to  form  modem  shading  languages.  One  line  of 
research  is  the  notion  of  programmable  shaders,  which  has  its 
roots  in  the  flexibility  of  the  shader  dispatcher  (Whitted  & 
Weimer  82]  and  which  was  expanded  to  fully  programmable 
shaders  in  [Cook  84].  The  other  research  track  is  the  use  of 
mathematical  function  composition  to  create  textures 
[Schachler  80]  [Gardner  84].  These  two  lines  of  research  were 
dramatically  brought  together  to  produce  a  mature  shading 
language  in  the  work  of  Ken  Perlin  [Perlin  85].  There  are  now 
several  graphics  machines  fast  enough  to  bring  some  of  this 
flexibility  to  real-time  graphics  applications  [Apgar  88] 
[Potmesil  &  Hoffert  89]  [Fuchs  89].  This  is  the  point  of 
departure  for  our  research. 

The  organization  of  this  paper  is  as  follows:  adiscussion  of  the 
pros  and  cons  of  procedural  textures;  an  overview  of  the  Pixel- 
Planes  5  hardware  and  software;  a  brief  description  of  our 
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language  for  composing  textures;  an  outline  of  the  algorithms 
involved  in  displaying  such  textures  on  Pixel-Planes  5;  a 
description  of  an  interactive  texture  editor  that  dynamically 
display  s  a  texture  as  the  user  changes  its  parameters;  examples 
of  dynamic  textures;  examples  of  applications  that  make  use 
of  the  texture  capabilities  of  our  system;  and  future  directions 
for  this  research. 

Why  Use  Procedural  Textures? 

Procedural  textures  provide  an  alternative  to  the  choice  of 
image-based  textures.  The  central  tradeoff  between  image 
and  procedural  textures  is  between  memory  cost  and 
execution  time. 

Graphics  architectures  that  are  well-suited  for  displaying 
image  textures  typically  have  large  amounts  of  memory 
associated  with  a  handful  of  fast  processors.  Each  processor 
retains  a  copy  of  every  image  texture  for  a  given  scene  so  that 
any  processor  can  perform  the  texture  look-up  at  any  given 
pixel  in  the  scene.  Texture  evaluation  thus  has  a  small,  fixed 
computational  cost,  at  the  expense  of  using  large  amounts  of 
memo^  to  store  the  texture  copies.  The  Silicon  Graphics 
Skywriter  and  the  Star  Graphicon  2(X)0  are  two  commercial 
graphics  engines  that  use  this  approach  with  impressive 
results. 

Our  implementation  of  procedural  teAtuies  on  Pixel-Planes  5 
provides  a  look  at  the  opposite  end  of  this  spectrum.  Each 
pixel  processor  has  only  208  bits  of  memory,  but  the  graphics 
machine  may  be  configured  to  have  on  the  order  of  256,000 
pixel  processors,  giving  the  ability  to  perform  several  billion 
instructions  per  second.  Their  very  small  memory  makes  the 
pixel  processors  poor  for  rendering  image-based  textures  but 
their  computational  power  makes  them  ideal  for  generating 
procedural  textures  on-the-fly. 

It  is  clear  that  any  procedural  ;exture  can  be  computed  once, 
saved  as  an  image,  and  used  in  a  scene  like  any  other  image 
texture.  In  this  sense,  it  can  be  argued  that  image-based 
textures  offer  everything  that  procedural  textures  can  piovlde, 
with  the  only  additional  cost  being  the  use  of  more  memory. 
Also,  it  is  clear  that  procedural  textures  are  a  poor  choice  when 
the  scene  requires  a  picture  hanging  on  the  wall  or  an  image 
on  the  cover  of  a  book.  Nevertheless,  procedural  textures  do 
have  benefits  of  their  own.  One  benefit  is  that  the  texture  can 
be  arbitrarily  detailed,  provided  that  the  texture  coordinates 
are  represented  with  enough  bits.  Each  additional  bit  added  to 
computation  of  a  function  of  two  variables  is  reflected  by  a 
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factonof.four  in,  memory  cost  to  ihiiiiic  the  texture  with  a 
stored  image.  ;A  more  dramatic  benefit  is  the  ability  to  define 
■  fextures  which  are  functions  of  many  variables,  sucfi-as 
animated, textures  and  solid  fextures.  The  memory  capacity  of 
graphics  systems  that  we  are  familiaf  with  is  not  large  enough 
to  explicitly  store  such  textures.  Pixel-Planes  5  offers  us  the 
-alternative  of  evaluating  on  demaiid  the  values  from  textures 
of  several  variables. 

Pixel-Planes  5  Overview 

Hardware  -  the  Pixel-Plaiies  5  machine  has  multiple  Intel 
i860-based  Graphics  Processors  (GPs)  and  multiple  SIMD 
pixel  processor  arrays  called  Renderers;  A  Renderer  is  a 
128x128  array  of  bit-serial  pixel  processors,  each  with  208 
bits  of  local  memory,  called p/jcc/  memory,  and  128x32  bits  of 
off-chip' backing  store  memory.  Each  Renderer  can  be 
mapped  to  any  128x128  pixel  region  of  an  image.  The 
Renderer  processors  are  capable  of  general  arithmetic  and 
logical  operations  and  operate  in  SIMD  mode.  Each 
processor  has  an  enable  bit  that  regulates  its  participation  in 
instructions.  Graphics  Processors,  Renderers,  Frame  Buffers, 
and  wo  rkstation  host  communicate  over  a  shared  640  Mb/sec 
ring  nitwork. 

Software  -  Generating  images  with  textured  polygons  on 
Pixel-Planes  5  is  a  multi-stage  process  which  can  be  viewed 
as- a  graphics  pipeline  (Fuchs  89]  as  shown  in  Figure  1. 
Transparent  polygons  are  handled  by  making  multiple  passes 
through  the  pipeline.  In  the  first  stage  of  the  graphics  pi^line, 
the  Graphics  ftocessors  transform  the  polygon  vertices  from 
model  space  to  perspective  screen  space  and  create  SIMD 
instruction  streams  (Image  Generation  Controller  or  IGC 
commands)  for  the  Renderers  to  rasterize  the  polygons.  A  Z- 
buffer  algorithm  is  executed  in  parallel  for  all  pixels  within  a 
polygon.  During  rasterization,  intrinsic  color  components, 
surface  normals,  texture  u.v  coordinates,  texture  scale  factor 
(used  for  antialiasing),  texture-id,  etc. ,  are  stored  in  the  pixels. 
After  rasterization  of  all  polygons,  each  pixel  processor  has 
the  parameters  of  its  front-most  polygon.  These  parameters 
are  then  used  in  the  next  two  stages  of  the  pipeline:  textuie 
program  interpretation  and  lighting  model  computation.  At 
the  beginning  of  texture  program  interpretation,  some 
initialization  is  performed.  The  rasterization  phase  actually 
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Figure  1;  Pixel-Planes  5  Graphics  Pipeline 


stores  u»z  and  \'z  rather  than  li  arid  v  in  pixel  memory  (since 
u*z  and  v-z  are  linear  in  screen  space),  so  a  z  division  is 
heeded.  Also  a  time  value  is  stored  in  pixel  memory  for  use 
in  animated,  textures.  The  lighting  model  cuirehtly"  used  is 
Phong  shading.  Sihce  all  pixels  are  handledcohcuirently  after 
all  rasterization,  we  call  this  approach  deferred  shading. 
Because  of  the  high  degree  of  parallelism  achieved  during 
deferred  shading,  we  can  afford  to  have  quite  elaborate 
procedural  textures  and  lighting  models  while  maintaining 
high  frame  rates. 

Texture  Programs 

Programming  Model  -  Procedural  textures  are  Implemented 
viaa  simple  virtual  machine.  This  texture  machine  comprises 
an  assembly  language-like  instruction  set  called  T-corfos,  a  set 
of  registers  in  pixel  memory,  and  a  set  of  parameters  in  the 
Graphics  Processor  memory.  The  pixel  parameters,  such  as 
intrinsic  color,  u.v  coordinates,  etc.,  ate  accessible  to  the 
texture  machine  via  its  pixel  memory  registers.  The  Graphics 
Processors  execute  the  T-codes  interpretively,  modifying  the 
pixel  variables  that  affect  shading.  More  exactly, 
interpretation  of  a  T-code  program  produces  an  IGG 
command  instruction  stream,  which  is  routed  to  the 
appropriate  Renderers  for  SIMD  execution, 

T-Codes  -  There  are  three  kinds  of  T-codes:  generators,  which 
produce  several  basic  texture  patterns,  operators,  which 
perform  simple  arithmetic  operations  on  texture  patterns,  and 
conditionals  which  permit  selected  pixels  to  be  included  or 
excluded  in  a  computation.  Generators  include  Perlin’s  band- 
limited  noise  function  [Perlin  85),  Gardner’s  sum-of-sines 
(Gardner  84],  antialiased  square  waves,  and  a  Julia  set. 
Examples  of  operators  include  add,  scale,  max,  square  root, 
splines,  and  color  table  lookup.  These  operators  can  be 
cascaded  to  implement  arbitrary  functional  composition. 
There  are  T-codes  for  conditional  execution  (by  having 
selected  pixel  processors  conditionally  disable  themselves), 
but  no  T-codes  for  loop-ng.  Adding  a  new  T-code  to  our 
system  is  a  straightforwa  tl  task.  Be.rides  coding  and  testing  of 
the  T-code  subroutine  in  C,  the  programmer  needs  only  to 
update  the  T-code  assembler  parse  table  and  the  T-code 
subroutine  dispatch  table. 

Sample  Texture  Program  -  The  following  T-code  fragment 
computes  an  antialiased  black  and  white  checkerboard 
pattern.  The  U  and  V  registerscontainthetexturecoordinates, 
and  the  D  register  contains  the  texture  scale  factor.  Output  is 
to  the  diffuse  color  components  D_Red,  D_Green  and 
D„Blue.  The  swavc  generator  produces  antialiased  square 
waves  in  one  dimension.  Note  how  the  outputs  of  the 
generators  are  combined  by  continuous  operators  for 
antialiasing,  rather  than  using  bitwise  exclusive-OR. 

§  make  antialiased  square  wave  in  U  direction 
swave  R,U,D;  swave_pararas 
f  make  antialiased  square  wave  in  V  direction 
swave  S,V,D;  swavejparatns 
f  R  and  S  registers  now  contain  stripes 
mul  T,  R, S 

add  W,R,S 

sub  W,W,T 

sub  W,W,T 

?  W  :=  R+S-2*R*S,  countinuous  exclusive  OR 
§  set  diffuse  colors  from  W 
copy  D_Red,W 

copy  D_Green,W 

copy  D_Blue,W 
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Certainly,  oui  texture  programming  language  is  hardly  state- 
of-the-art  with  respect  to  programming  ease.  This  is 
compensated  to  some  extent  by  the  fact  that  texture  programs 
tend  to  be  rather  short  -  typically  20-40  instructions.  The 
programs  are  short  because  the  built-in  generators  and  some 
of  the  operators  (such  as  spline  and  color  table  lookup)  arc 
fairly  powerful.  The  main  job  of  the  texture  programmer  is 
producing  the  appropriate  “glue”  code  to  tie  these  together.  In 
addition,  as  discussed  later,  programming  is  facilitated  by  an 
interactive  texture  editor  program  which  allows  the  use  of 
macros. 


Antialiasing  Techniques 

Antialiasing  of  procedural  textures  is  a  difficult  problem  to 
which  we  have  not  found  a  general  solution;  instead  we  have 
developed  a  few  techniques  which  work  fairly  well  for  many 
texture  programs.  The  theoretically  proper  method  is  to 
convolve  the  texture  with  a  filter  kernel  of  an  appropriate 
shape,  centered  at  the  pixel.  In  principle,  this  is  possible  since 
each  pixel  processor  knows  the  entire  texture,  but  in  practice, 
this  can  be  done  only  for  the  simplest  textures,  because 
integrating  arbitrary  functions  of  two  variables  is  difficult. 


Texture  Procedu^'s  Evaluation  Details 

Pixel  Memory  Management  -  The  Pixel-Planes  5  Renderers 
contain  208  bits  of  on-chip  memory  and  4096  bits  of  off-chip 
backing  store  memory  per  pixel.  Backing  store  memory 
cannot  be  directly  addressed  uy  IGC  instructions,  but  must  be 
swapped  in  and  out  by  special  instructions.  Because  texture 
programs  usually  require  the  use  of  scratch  memory  space  and 
because  a  rather  large  number  of  pixel  variables  are  needed  to 
suppo*^  deferred  shading,  there  is  not  enough  pixel  memory  to 
statically  allocate  it  for  the  worst  case.  Therefore,  a  pixel 
memory  manager  keeps  track  of  the  locations  of  the  variables 
and  to  perform  memory  movement  and  backing  store 
swapping  to  make  available  required  amounts  of  scratch 
memory  space. 

Caching  of  IGC  Commands  -  For  static  texture  programs, 
the  IGC  commands  do  not  change  from  frame  to  frame,  and 
thus  the  T-code  translation  step  need  occur  only  once.  Note 
that  static  texture  programs  do  not  imply  static  textures;  the 
result  of  executing  a  texture  program  may  vary  with  time,  if 
time  is  an  input  variable.  During  texture  parameter  editing, 
the  T-code  program  must  be  reinterpreted  each  time  it  is 
changed.  The  Graphics  Processorsi.ache  the  IGC  commands 
resulting  from  texture  interpretation  to  avoid  genuiUing  them 
repeatedly. 

Region-Hit  Fiags  •  Since  each  Renderer  covers  a  small 
(128x128  pixel)  region  of  the  screen,  it  is  likely  that  only  a 
small  subset  of  the  textures  will  be  represented  in  a  given 
region.  The  Graphics  Processors  flag  each  region  that  any 
textured  polygon  intersects  as  needing  that  particular  texture. 
The  Graphics  Processor  that  creates  the  texturing  commands 
for  a  particular  region  checks  the  OR’ed  flags  from  alt 
Graphics  Procca^ors  for  that  region,  and  creates  and  sends  the 
texture  programs  for  only  those  textures  that  mighi  be  visible. 

Obtaining  Real-Time  Performance 


In  order  to  do  antialiasing,  we  need  some  estimate  at  each  pixel 
of  how  an  area  element  in  screen  space  maps  into  texture 
space.  Ideally,  we  would  use  the  derivatives  of  u  and  v  with 
respect  to  screen  space  x  and  y.  However  because  of  limited 
pixel  memory,  we  decided  to  record  this  estimate  using  a 
single  number,  called  the  texture  scale  factor.  This  number  is 
intended  to  represent  the  maximum  magnification  factor  that 
can  occur  when  a  unit  vector  in  screen  space  is  mapped  to 
texture  space.  The  texture  scale  factor  is  available  in  a  pixel 
memory  register  for  use  in  T-code  programs.  The 
approximation  we  use  is  max(lu,l+lu  l,lvj+lv  I),  which  is 
within  a  factor  of  1.42  of  the  commonly  used  formula  max 
((u,-+Uy-)''^  for  MIPmaps  [Williams  83]. 

Because  of  this,  our  textures  are  over-blurred  when  viewed  at 
certain  angles.  Just  like  MIPmaps.  Texture  scale  factor  is 
computed  for  polygons  as  follows.  When  the  polygon  is 
rasterized,  the  u  and  v  coordinates  at  the  middle  of  the 
polygon,  u^j  and  v^are  computed.  The  linear  expression  for 
uz  =  ax+by+c,  is  differentiated  to  give  u^z+uz,  =  a,  which  is 
solved  for  the  constant  u^z  =  a-u^z,.  Similarly  u^z,  v,z,  and 
v^z  are  computed.  From  these  max(lu,zl+lu^zl,lv'zl+Iv^zl)  is 
computed  and  stored  in  pixel  memory.  Finally,  just  before 
texture  program  evaluation,  a  parallel  z  divide  is  performed 
for  all  pixels.  This  is,  of  course,  an  ajjproximatio.n  due  to  the 
substitution  of  u^  for  u.  The  approximation  eiTor  manifests 
itself  as  a  difference  in  the  amount  of  blurring  at  the  comers 
of  a  polygon  that  is  being  viewed  at  a  very  oblique  angle  (large 
z,).  We  found  that  the  error  is  not  noticeable  in  ordinary 
scenes,  although  it  can  be  seen  in  contrived  'cst  casen. 


The  antialiased  square  wave  generator  produces  an 
antialiased  stripe  pattern  with  a  specified  phase,  frequency, 
and  duty  cycle.  The  generator  analytically  computes  the 
convolution  integral  of  a  box  filter  kernel  with  a  square  wave 
function  of  its  input  parameter  in  one  dimension.  The  width 
of  the  box  filter  is  the  texture  scale  factor.  Initially  we 
implemented  a  triangular  filter  kernel,  but  found  that  it 
required  too  much  scratch  pixel  memory . 


Our  goal  for  real-time  procedural  textures  was  to  deliver  at 
least  15  frames-per-second  to  real  applications  in  research 
projects  at  ''NC.  This  goal  has  been  met,  and  these 
applications  i  re  described  in  a  later  section. 

There  are  two  crucial  issues  for  rapid  texture  evaluation.  The 
first  issue  is  *o  maximize  utilization  of  the  pixel  processors. 
This  is  achieved  by  waiting  lo  execute  the  texture  programs 
until  all  polygons  have  been  rasterized,  sc  parallelism  of  the 
texture  programs  can  be  maximized.  In  addition,  by  use  of 
region-hit  flags,  we  avoid  processing  texture  programs  for 
screen  regions  that  don't  have  the  texture.  The  second  issue  is 
enabeling  the  Graphics  Processors  to  keep  up  with  the 
Renderers.  This  t>  accomplished  by  the  IGC  instruction 
caching.  We  incrs.ased  the  performance  of  the  Walkthrough 
application  from  2  to  20  frames/sec  by  the  use  of  region-hit 
flags  and  tho  IGC  instruction  caching. 


A  method  that  works  for  some  textures  is  to  antialias  the  final 
color  table  lookup.  The  idea  is  to  return  a  final  color  that  is  the 
Integral  over  some  finite  interval  in  the  color  table,  rather  than 
a  point  sample.  The  width  of  the  integration  interval  is 
proportional  to  the  texture  scale  factor  times  the  maximum 
gradient  magnitude  of  color  with  respect  to  u  and  v.  This 
integral  is  simple  enough  to  be  computed  analytically  in  the 
pixel  processors.  If  the  gradient  magnitude  of  the  texture 
value  input  to  the  color  table  is  reasonably  smooth,  this 
roughly  approximates  the  correct  convolution  integral,  and 
does  a  fairly  good  job  in  practice  for  many  textures.  It  fails 
utterly  for  lextures  that  are  discontinuous  functions  of  u  and  v. 
This  kind  of  texture  gradually  loses  contras'  as  the  texture 
scale  factor  increases,  but  before  the  texture  fides  to  a  unifonn 
color,  there  is  severe  aliasing. 
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Another  method  works  in  the  frequency  domain.  Some  of  our 
texture  programs  “roll  off’  the  amplitude  of  the  band-limited 
noise  based  on  the  texture  scale  factor.  The  result  is  that  the 
noise  fades  to  a  uniform  value  at  scales  where  aliasing  would 
be  a  problem. 

Interactive  Editing  of  Texture  Procedures 

An  interactive  texture  editor  eliminates  the  need  for  an  edit- 
compile-link-test  cycle.  Since  T-code  programs  are  executed 
inteipretively  at  run  time,  texture  procedures  can  be  changed 
without  recompilation.  Furthermore,  the  interpretation  phase 
is  fast  enough  so  that  literal  values  (Graphics  Processor 
parameters)  in  T-code  instructions  can  be  updated  in  a  single 
frame  time,  at  frame  rates  of  more  than  30  frames  per  second. 
The  texture  editor  displays  the  T-code  instructions  of  a 
selected  procedural  texture  in  a  text  window.  The  user  can 
position  a  movable  cursor  on  any  literal  value  in  a  T-code 
instruction,  and  then  smoothly  vary  this  value  via  a  joystick. 
The  dynamically  updated  texture  pattern  is  displayed  on  the 
graphics  system  with  a  two-frame  lag  (the  graphics  pipeline 
overlaps  two  frames).  At  over  30  frames  per  second  this  lag 
time  is  hardly  noticeable.  Hence  the  user  can  explore  the 
parameter  space  of  a  texture  procedure  continuously  in  real 
time. 

More  drastic  changes  to  texture  programs  can  be  made  by 
interactively  editing  the  text  of  the  program  in  another 
window  via  a  conventional  text  editor.  T-code  instructions 
can  be  added,  rearranged,  and  deleted,  producing  a  new 
program.  Then  with  a  couple  of  commands,  the  user  can  save 
the  updated  texture  program  and  reload  it  into  the  texture 
editor  for  immediate  display.  This  process  takes  from  one  to 
five  seconds,  which  due  to  the  more  discrete  nature  of  such 
changes,  can  still  be  viewed  as  interactive  editing. 

What  the  user  sees  on  the  graphics  system  is  a  complete  .scene 
with  possibly  many  graphics  primitives  and  texture 
procedures,  not  just  a  single  isolated  texture  pattern.  The 
texture  editor  provides  a  complete  set  of  commands  to  access 
the  facilities  of  our  graphics  library.  Thus  the  user  can  change 
the  viewpoint,  move  objects  around,  change  the  locations  and 
parameters  of  light  sources,  etc.  This  is  important,  because  the 
appearance  of  a  texture  is  dependent  on  its  visual  context. 


Figure  2:  Generating  Flames 


Dynamic  Textures 

Texture  j  have  been  traditionally  considered  to  be  functions  of 
spatial  coordinates  u  and  v.  A  generalized  texture,  however, 
need  not  be  restricted  to  just  mappings  from  the  spatial 
coordinates.  One  could  consider  a  texture  to  be  a  function  of 
several  other  parameters  as  well  -  time  and  surface  normal,  to 
mention  just  a  couple.  Procedural  textures  permit  us  to  create 
these  generalized  textures  without  the  memory  overheads  that 
would  be  required  with  image  textures.  Since  these  textures 
change  spatially  based  on  input  parameters  that  need  not  be 
restricted  to  just  those  that  define  the  mapping,  wc  prefer  to 
call  them  dynamic  textures. 

If  we  consider  a  te.xrurt  s  to  be  a  function  of  u,  v,  and  t  where 
t  is  a  time  variable,  we  can  produce  time-varying  animated 
procedural  textures  such  as  a  fire  texture  that  flickers  and 
water  waves  that  ripple.  If  we  consider  textures  as  functions 
of  u,  V  and  n  where  n  is  the  normal  to  the  surface  that  has  been 
textured,  then  it  is  possible  to  do  environment  mapping  by 
defining  an  appropriate  procedural  texture.  Dynamic  textures 
implemented  this  way  can  still  be  precomputed  because  the 
program  text  for  the  texture  doesn’t  change.  Another  way  to 
produce  dynamic  textures  is  to  edit  the  xture  programs  after 
each  frame,  but  then  there  is  some  loss  of  performance  since 
precomputation  of  IGC  commands  isn’t  po.ssible.  In  the 
following  sections  we  describe  how  we  implemented  several 
dynamic  textures. 

Fire  -  An  example  of  an  animated  texture  is  a  flickering  flame. 
We  implement  a  fire  texture  as  follows  (Figure  2):  First 
perturb  u  by  adding  to  it  a  2D  noi.se  function  of  u  and  t.  Then 
generate  a  height  field  h  by  applying  a  2D  noise  generator  to 
u  and  t.  Compute  flame  intensity  f  =  1  -v/h.  if  f  <  0  set  f  to  0. 
This  creates  a  moving  outline  of  the  flame.  Because  of  the 
noise  perturbation  of  u,  the  outline  moves  both  vertically  and 
horizontally.  Finally  we  copy  f  to  opacity  and  use  a  color  table 
with  input  f  to  produce  color.  We  use  two  layers  of  transparent 
fire  texture  to  produce  the  fireplace  shown  in  Photo  3. 

Environment  Mapping  -  The  next  example  is  a  dynamic 
texture  depending  on  object  orientation  instead  of  time.  It 
implements  environment  mapping  of  a  simple  checkerboard 
pattern  onto  a  teapot.  The  textured  teapot  appears  to  be  located 
inside  a  room  with  checkerboard  walls,  as  shown  in  Photo  5. 
Rotating  the  object  lets  the  reflections  move  across  the  surface 
in  a  realistic  way.  We  accomplish  this  by  performing  typical 
environment  mapping  computations  [Blinn  &  Newell  76) 
(determine  reflected  eye  vector,  compute  indices,  compute 
procedural  texture  as  function  of  indices)  in  a  T -code  program 
for  each  pixel. 

Our  cuirent  system  has  two  limitations  for  environment 
mapping.  First'  because  the  normal  vector  is  only  available  in 
eye  space  coordinates,  the  (infinitely  distant)  .cflective 
environment  appears  to  be  attached  to  the  camera.  Thus, 
whenever  the  camera  is  rotated  (panned,  tilted  or  tolled),  the 
reflections  move  across  the  object’s  surface  in  an  erroneous 
way.  If  we  aad  enough  pixel  memory  to  store  world  space 
normals  this  restriction  could  be  removed.  Second,  we  cannot 
perfonn  antialiasing  properly,  since  we  do  not  have  surface 
curvature  information  available  iii  pixel  memory. 

Water  -  The  final  example,  shown  in  Photo  6,  is  an  animated 
texture  approximating  svater  waves  by  means  of  an  animated 
procedural  bump  map.  This  dynamic  texture  is  a  function  of 
both  time  and  spatial  orientation.  The  pixel  normals  are 
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perturbed  on  the  basis  of  a  height  field  whose  value  is 
computed  at  each  pixel.  The  derivatives  requited  for  the 
normal  perturbation  are  computed  by  finite  differences.  The 
height  field  consists  of  superimposed  circular  and  parallel 
moving  sinusoidal  waves  generated  by  a  number  of  sources 
distributed  across  the  water-textured  surface,  a  common 
appro.ich  for  this  problem.  The  surface  characteristics  are 
such  iiiat  the  water  surface  appears  highly  specular.  In 
addition,  the  normals  are  used  to  compute  a  simple  one¬ 
dimensional  color  scale  environment  map,  which  is  used  to 
create  a  more  natural  appearance.  The  map  has  rotational 
symmetry  about  a  vertical  axis,  so  that  the  camera  can  be 
arbitrarily  panned.  However,  tilting  or  rolling  the  camera 
would  generate  erroneous  results,  for  the  reasons  mentioned 
in  connection  with  the  environment  mapping  texture.  As 
mentioned,  this  restriction  could  be  removed  by  storing  world 
space  normals  at  each  pixel.  We  also  have  a  problem  with 
determining  which  way  to  perturb  the  surface  normals,  since 
we  do  not  have  the  surface  tangent  vectors  in  the  u  and  v 
directions  available  in  pixel  memory.  We  can  circumvent  this 
problem  for  horizontal  polygons  (like  water  surfaces)  by 
broadcasting  the  current  transformation  matrix  to  the  pixel 
processors  during  the  texturing  phase  of  each  end-of-frame 
calculation.  The  scene  in  Photo  6  was  rendered  in  33 
milliseconds,  low  resolution,  with  24  GPs  and  12  Renderers. 

Applications  Using  Procedural  Textures 

Pixel-Planes  5,  besides  being  a  research  project  in  its  own 
right,  is  also  an  important  resource  for  several  other  research 
projects  at  UNC.  Two  of  those  for  which  textures  are 
important  are  the  Building  Walkthrough  project  and  the  Head- 
Mounted  Display  project.  Both  of  these  use  a  stereo  head- 
mounted  display  and  head  tracking,  so  high  frame  rates  are 
necessary  to  maintain  the  illusion  of  the  virtual  environment. 

Walkthrough  -  The  UNC  Walkthrough  Project  aims  at  the 
development  of  a  system  for  creating  virtual  building 
environments  [Brooks  86).  This  is  intended  to  help  architects 
and  their  clients  explore  a  proposed  building  design  prior  to  its 
construction,  correcting  problems  on  the  computer  instead  of 
in  concrete.  Texturing  plays  an  important  role  in  enhancing 
image  realism.  Having  textures  for  bricks,  wood,  ceiling  tiles, 
etc.,  adds  to  the  richness  of  the  virtual  building  environment 
and  gives  an  illusion  of  greater  scene  complexity.  The 
radiosity  illumination  model  is  used  in  the  Walkthrough 
project.  We  can  display  a  model  of  a  house  that  contains  about 
34,000  polygons  and  20  procedural  textures  at  15-20  frames/ 
sec  on  24  Graphics  Processors  and  12  Renderers  at  640x512 
resolution.  Photo  1  shows  a  view  of  the  living  room  of  the 
house,  and  Photo  2  shows  a  view  of  the  kitchen. 

For  enhanced  realism,  textures  have  been  integrated  with 
radiosity  in  Walkthrough.  There  are  two  stages  in  this 
integration.  The  first  stage  is  to  calculate  radiosity  values  for 
a  textured  polygon,  such  that  the  radiosity  effects  such  as  color 
bleeding  are  correctly  simulated  for  the  polygons  near  this 
textured  polygon.  Tire  second  stage  is  to  shade  the  textured 
polygon  itself  by  the  radiosity  values  at  its  vertices.  To  effect 
the  first  step,  the  color  of  a  textured  polygon  is  assigned  to  be 
the  average  color  of  its  texture.  This  color  is  then  used  in  the 
radiosity  process  as  usual.  After  the  radiosity  values  at  the 
vertices  of  a  polygon  have  been  computed,  they  are  passed  as 
input  parameters  to  the  procedural  texture  for  this  pol>gon 
along  with  other  input  parameters  such  as  the  u  and  v 
coordinates.  These  shading  values  are  linearly  interpolated 
across  the  polygon.  The  procedural  texture  is  computed  as 
before  and  a  post-multiplication  of  the  interpolated  radiosity 


shading  values  with  the  computed  texture  colors  at  each  pixel 
gives  a  smooth  shading  effect  over  the  textured  polygon. 

Another  application  which  textures  find  in  the  Walkthrough 
project  is  that  they  offer  one  way  to  switch  lights  in  a  virtual 
building.  The  total  radiosity  illumination  of  a  polygon  is 
determined  by  the  dot-product  of  the  vector  of  light  values  and 
the  radiosity  vector  specifying  the  contribution  of  each  light 
source  to  the  illumination  of  the  polygon.  This  then  means  that 
given  the  latter,  the  user  can  vary  the  intensity  of  a  light  source 
and  observe  the  same  building  model  under  different  light 
scales  (but  same  light  positions),  by  just  computing  the  dot 
product  as  described  before  [Airey90].  This  however  takes 
roughly  3  -  5  seconds  for  a  dataset  of  roughly  30,000  polygons 
and  20  light  sources  if  done  sequentially  on  the  host 
workstation  and  fails  to  provide  the  effect  of  instantaneous 
light  switching.  One  possibility  to  do  this  fast  enough  to 
provide  an  instantaneous  effect  (under  a  tenth  of  a  second)  is 
to  do  this  in  parallel  by  using  T-codes.  The  idea  is  to  pass  the 
intensity  value  of  a  light  source  as  an  input  parameter  to  a  T- 
code  program  (along  with  the  polygon  colors)  which 
computes  the  dot-product  of  the  input  parameter  with  the 
value  of  the  interpolated  radiosity  (as  described  in  the 
preceding  paragraph)  and  uses  the  resulting  value  to  shade  the 
polygon.  Changing  the  intensity  of  a  light  source  can  then  be 
done  by  editing  the  T-code  program  and  changing  this  input 
parameter.  This  is  essentially  using  the  T-code  commands  as 
a  shading  language. 

Head-Mounted  Display  -  In  the  Head-Mounted  Display 
project,  the  primary  use  of  textures  has  so  far  been  in  a 
mountain  bike  simulation,  where  the  user  rides  a  stationary 
bicycle  and  views  simulated  terrain  through  the  head- 
mounted  display.  Textures  me  u^ed  to  increase  the  apparent 
scene  complexity  and  to  iinpiove  the  user’s  perception  of 
motion  through  the  environment.  This  application  features 
relatively  few  textures  (grass,  toad,  and  cloudy  sky),  each  of 
which  covers  a  fairly  large  area  of  the  images.  A  scene  from 
this  application  is  shown  m  Phclo  4.  The  cloudy  sky  texture 
makes  use  of  the  Gardner  texture  generuti  ,  The  grass  and 
road  texture  make  use  of  band-limited  2  D  noise,  and  are 
antialiased  by  decreasing  the  noise  amplitude  as  the  texl.re 
scale  factor  increases.  Several  frequencie.^  of  noise  are  used 
each  with  its  own  thieshold  for  rolloff.  Thi-  emulation  runs 
at  20-25  frames  pc  second  in  low  resolutiui  (640x5 1 2)  stereo 
mode  using  32  G  .tphics  Processors  and  2f.  Renderers. 

Future  Work 

The  logical  ne»t  .step  to  our  simple  textere  language  is  to 
implement  .■  full  fledged  shading  laiiguige  that  can  be 
executed  on-tlie  Ily ,  I  .'sing  the  defened  shading  paradigm  on 
a  high-end  giaphics  ma^  hine,  mal ime  e  xe ,  ution  of  a  shading 
language  si.cli  KenJsmnan  [H  irrahan  &  Lawson 90)  seems 
to  Iw  a  very  ivalpossibilii).  I'nfo.lunately  this  is  impractical 
on  the  cuirei".  Pixel-Planes  s\  stem  due  t.i  the  small  amount  of 
memory  av.ii!able  to  the  pixei  piCKC  sors.  However,  it  is 
likely  th.it  the  Pixel-Flow  machine  [Molnar  91),  now  being 
designed  .it  UNC  Chapel  Hill,  wi  .  have  sufficient  pixel 
memory  to  make  this  idea  viable. 
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Abstract 

Tensor  product  surfaces  are  now  widely  used  in  application  ar¬ 
eas  such  as  industrial  design  and  computer  animation  and  thus  the 
quest  for  more  effective  design  methods  continues.  Although  sev¬ 
eral  methods  exist  for  tq>plying  high-level  operators  such  as  bends, 
twists  and  ffee-form  deformations  (FFD's),  much  less  effort  has 
been  applied  to  improving  direct  and  precise  free-form  shaping 
which  is  often  desired,  liie  dominant  form  of  free-form  manip¬ 
ulation  has  been  control-point  based.  Here  we  offer  a  manipula¬ 
tive  method  that  presents  geometric  properties  (e.g,  points  on  the 
surface,  normal  vectors,  etc.),  rather  than  control  vertices  or  defor¬ 
mation  lattices,  and  allows  direct  manipulation  of  these  properties 
at  any  selected  point  on  the  surface.  The  difficulties  of  interacting 
with  these  three-dimensional  geometric  entities  using  both  two-  and 
three-dimensional  input  devices  are  discussed,  as  arc  possible  inter¬ 
active  schemes  using  several  such  devices. 

CR  Categories:  1.35  (Computer  Graphics];  CompuUtional 
Geometry  and  Object  Modelling  •  parametric  surfaces;  1.3.6  [Com¬ 
puter  Graphics];  Methodology  and  Techniques  -  interactive  tech¬ 
niques,  direct  manipulation,  constraints;  J.6  (Computer-Aided  En¬ 
gineering];  Computer-Aided  Design  (CAD). 

Keywords:  Computer-aided  geometric  design,  B-spline  sur¬ 
faces,  interactive  sculpting,  three-dimensional  interaction. 

1  Introduction 

TensOT  product  surfaces  are  a  widely  used  primitive  in  many 
geometric  modelling  systems.  The  majority  of  recent  work  in  the 
interactive  aspects  of  modelling  these  surfaces  has  been  in  provid¬ 
ing  high-level  deformation  tools  (6, 12, 7].  In  many  applications, 
however,  ffee-form  shaping  is  required  that  is  not  easily  expressed 
in  terms  of  regular  shape  operators.  Control  point  manipulation  is 
generally  inappropriate  in  such  cases,  and  construction  of  deforma- 
tirm  lattices  for  tool-based  deformation  is  an  indirect,  often  unnec¬ 
essarily  tedious,  solution.  When  specific  geometric  properties  are 
required,  a  designer  should  be  able  to  select  a  point  or  region  of  a 
surface  and  specify  these  target  properties.  Such  direct  manipula¬ 
tion  can  be  achiev^  for  curves  [  10]  and  recent  work  on  differential 
manipulation  [14]  points  to  constraint  based  methods  for  surfaces. 
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Meanwhile,  there  is  a  constant  drive  to  develop  new  geometric 
forms  that  overcome  these  and  other  limitations  of  tensor  product 
surfaces.  Triangular  surface  patches  have  long  been  available  (see 
[8])  as  a  blending  primitive  to  help  alleviate  the  topological  restric¬ 
tions  of  tensor  p^uct  surfaces.  Meanwhile  Loop  and  DeRose(ll] 
arc  two  of  many  who  have  pursued  multi-sided  patches.  More  re¬ 
cently  a  new  modelling  paradigm,  based  on  triangular  patches,  has 
been  presented  [3]  that  combines  geometric  constraints  with  sculpt¬ 
ing  operations  based  on  forces  and  loads  that  yield  very  fair  shapes, 
hence  addressing  both  the  topological  restrictions  a^  geometric 
constraints.  Although  this  method  shows  definite  promise  for  the 
engineering  community,  its  suitability  for  non-technical  users  is  un¬ 
clear  as  the  interactive  issues  in  dealing  with  forces  and  loads  are 
still  being  explored.  Regardless  of  the  promise  of  these  new  forms, 
there  is  still  a  large  investment  in  tensor  product  surfaces,  and  de¬ 
velopers  would  rather  find  ways  to  improve  their  cunent  technology 
than  pursue  a  new  approach. 

There  has  been  little  attention  paid  to  the  problems  of  interac¬ 
tively  designing  free-form  geometric  shapes  in  three  dimensions, 
not  ffom  the  view  of  algorithms  or  tools,  but  in  terms  of  direct  in¬ 
teraction  with  surface  geometry  in  three  dimensions.  Clark  [S]  used 
a  hand  held  wand  to  select  and  reposition  B-spline  control  vertices 
as  early  as  1976,  but  not  much  has  happened  since  that  time.  More 
contemporary  work  in  3D  interaction,  including  Bier  [2],  relies  on 
the  use  of  construction  aides  that  affect  attributes  being  manipu¬ 
lated,  but  in  the  ca.se  of  surfaces,  those  attributes  are  almost  always 
the  control  points.  More  recently,  Weimer  andGanapathy  [13]  have 
used  a  VPL  DataGlove  to  manipulate  surfaces  in  an  experimental 
modelling  environment.  Although  their  system  is  far  in  advance 
of  Clarke’s  work  (incorporating  voice  and  hand  gestures  for  input) 
their  methods  of  manipulating  curves  and  surfaces  consist  primar¬ 
ily  of  ffee-hand  sketching  and  direct  control  point  manipulation  in 
space  -  methods  that  have  long  been  achievable  (albeit  more  awk¬ 
ward)  with  a  mouse  or  tablet.  One  of  our  goals  is  to  use  a  pair  of 
“virtual  hands”  (at  this  point  a  pair  of  VPL  DataGloves)  that  will 
allow  direct  “hands-on”  interaction  with  tlie  surface  itself,  and  not 
it’s  mathematical  attributes,  i.e.  it’s  control  points.  To  this  end, 
we  discuss  a  few  methods  we  have  implemented  to  provide  more 
intuitive  shaping  of  the  surface  with  a  single  DataGlove. 

Accepting  the  fact  that  tensor  product  surfaces  are  widely  used 
and  that  their  full  potential  has  not  yet  been  realised,  we  propose 
methods  of  improving  interaction  witli  these  surfaces.  In  Section  2 
we  illustrate  how  to  efficiently  solve  and  apply  systems  of  differen¬ 
tial  constraints  to  tensor  product  surfaces.  In  Section  3,  the  formu¬ 
lation  of  geometric  constraints  in  terms  of  diffe<ential  constraints  is 
given,  along  with  a  discussion  of  additional  degrees  of  freedom  that 
have  intuitive  geometric  effects  (which  we  refer  to  as  uniform  and 
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directional  tension).  We  limit  ourself  in  this  paper  to  the  manipula¬ 
tion  of  properties  dependent  solely  upon  first-order  derivatives,  de¬ 
ferring  a  more  analytical  study  of  surface  curvature  to  a  later  date. 
In  Section  4,  we  suggest  possible  ways  of  interacting  with  these 
geometric  properties  and  ctegrees  of  fir^om  with  conventional  2D 
input  devices  (a  mouse  or  Ublet),  a  Polhemus  3Space  Isotrak  *  and 
a  VPL  DataGlove.^  We  close  with  a  summary  and  description  of 
related  work  in  progress. 

2  Differential  Constraints 

Recent  work  on  interactive  techniques  for  curves  has  led  to  the 
development  of  direct  manipulation  interfaces  that  do  not  rely  on 
user  interaction  with  control  points.  The  direct  manipulation  tech¬ 
nique  described  in  [1]  was  generalized  in  [10]  to  manipulate  higher 
order  properties  including  tangency  and  curvature.  Here  we  extend 
this  work  to  tensor  product  surfaces. 

For  parametric  curves,  direct  manipulation  of  geometric  prop¬ 
erties  was  achieved  1^  coordinating  the  parametric  derivatives  (to 
achieves  specified  geometry)  and  solving  a  linear  system  of  equa¬ 
tions  that  enforced  the  required  changes  to  these  derivatives.  We 
refer  to  such  specificationsof  derivatives  as  differential  constraints 
whereas  a  set  of  differential  constraints  that  achieve  a  specified  ge¬ 
ometric  property  are  referred  to  as  a  single  geometric  constraint. 
Other  sets  of  differential  constraints  that  arc  significant,  but  not 
necessarily  geometrically  intuitive,  may  be  refened  to  loosely  as 
a  parametric  constraint. 

Since  a  geometric  constraint  at  a  particular  point  on  the  curve  is 
determined  by  its  parametric  derivatives  at  that  parametric  point,  a 
geometric  constraint  will  generally  consist  of  an  underdetermined 
(in  a  minority  of  cases,  well-determined)  system  of  equations.  This 
statement,  however,  assumes  that  the  degree  of  the  curve  supports 
the  degree  of  the  properties  to  be  manipulated,  e.g.  specifying  a 
change  in  curvature  for  a  curve  that  is  a  straight  line  (a  linear  poly¬ 
nomial)  would  result  in  an  overdetermined  system,  requiring  the 
degree  of  the  curve  to  be  raised.  Overdetermined  systems  may  also 
occur  when  combining  multiple  geometric  constraints  at  different 
(but  nearby)  points  on  the  curve.  Such  a  system  can  always  be  made 
underdetermined  by  suitable  refinement. 

The  solutions  obtained  for  these  underdetermined  systems  was 
that  which  minimized  the  combined  movement  of  the  control  ver¬ 
tices  involved.  Tliis  method  is  briefly  summarized  here.  Its  appli¬ 
cability  to  tensor  product  surfaces  is  then  illustrated. 

2.1  Constraints  for  Curves 

A  parametric  curve  segment  of  degree  n  (both  non-rational  and 
rational)  may  be  expressed  as 

IS 

X)VtB.(u)  =  Q(u) 

i*0 

and  thus  its  derivatives  as 

^V.B^«(u)  =  Q<‘>(u). 

s*0 

A  change  to  a  derivative  of  the  curv'e  is  similarly  represented  as 

n 

5^AVrB!«(u)  =  AQ<«(u) 

i=0 
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which  is  referred  to  in  matrix  form  as 

B<''>(«)AV^  AQ'^^fu).  (1) 

When  shaping  a  curve  in  a  design  application,  a  user  typically  has 
some  target  geometry  in  mind,  e.g.  a  point  to  be  interpolated,  a 
tangent  line  to  be  met,  etc.  Such  targets  can  be  expressed  in  terms 
of  changes  to  the  derivatives  of  the  curve  at  some  chosen  paramet¬ 
ric  point  A  system  of  equations  containing  these  chanjio.s  to  the 
derivatives  is  given  as 

B(«)AV^  =  AQ’’(tI)  (2) 

where  each  row  of  the  matrix  B(u)  and  vector  AQ  ^(u)  represents 
a  differential  constraint,  recall  (1),  applied  at  u. 

Note  that  we  select  u  at  which  to  apply  the  constraints.  Although 
we  might  allow  the  parametric  point  to  vary  in  satisfying  the  con¬ 
straints,  we  select  u  ti'  preserve  the  linearity  of  the  system,  which 
allows  extremely  efficient  interactive  updates.  The  actual  determi¬ 
nation  of  u  can  be  performed  geometrically  by  the  user  by  either 
selecting  the  curve  itself,  or  specifying  a  nearby  point  to  be  in¬ 
terpolated,  in  which  case  the  closest  parametric  point  on  the  curve 
can  be  determined.  In  both  of  these  cases,  u  is  chosen  implicitly  by 
a  geometric  specification  on  the  user's  part. 

Conventional  control  point  based  manipulations  require  the  user 
to  specify  the  left  hand  side  of  (2),  i.e.  the  AV  in  or^r  to  achieve 

a  desired  change  to  the  shape  of  the  curve  reflected  in  the  resulting 
AQ^.  The  assessment  of  the  success  of  this  change  is  usually  per¬ 
formed  visually,  and  is  thus  often  inadequate  (exact  positioning  of 
control  points  or  endpoints  of  the  curve  is  often  permitted,  but  not 
an  arbitrary  point  on  the  curve).  Instead,  we  derive  a  suitable  so¬ 
lution  for  A  V’’  given  a  specification  of  the  right  hand  side,  AQ 
i.e.  given  one  or  more  geometric  constraints. 

The  solution  to  the  system  in  (2)  is  given  in  [10]  as 

AV**  =  B’^  (BB®')-*  AQ’’  (3) 

which  turns  out  to  be  the  right-inverse  of  B  when  the  rows  of  B  are 
linearly  independent  (as  is  most  often  the  case).  For  the  sake  of  no- 
taiionid  convenience,  when  we  refer  to  the  inverse  of  a  matrix,  if  it 
is  non-square,  this  will  imply  the  right-inverse.  Note  that  since  B~* 
is  independent  of  AQ  ’’(which  changes  in  each  iteration  of  an  inter- 
ctive  loop),  the  solution  can  be  applied  efficiently  by  precomputing 
B~’  for  a  given  u  (or  set  of  u’s  at  which  constraints  are  applied). 
This  solution  becomes  particularly  efficient  when  only  one  element 
of  AQ  is  nonzero  (a  common  occurrence)  in  which  case  only  one 
column  of  B”*  need  be  computed. 

One  issue  that  arose  with  direct  manipulation  of  curves  was  how 
many  and  which  control  vertices  to  incorporate  into  the  systems  of 
constraints.  We  found  that  the  control  vertex  with  maximal  posi¬ 
tional  influence  on  the  curve  at  u  should  definitely  be  used,  and 
that  one  additional  degree  of  freedom  (control  vertex)  should  be  in¬ 
cluded,  i.e.  if  we  have  two  constraints,  then  three  control  vertices 
should  be  included  in  the  sys'.c.'n.  Including  this  additional  degree 
of  freedom  reduced  undesirable  asymmetry  that  can  result  when  the 
solution  is  unique.  The  control  vertices  discarded  from  the  system 
should  be  those  with  the  least  influence  on  the  curve  at  u. 

2.2  Constraints  for  Surfaces 

The  solution  method  described  for  curves  can  be  applied  to  any 
underdetermined  system  of  linear  equations.  Tensor  product  sur¬ 
faces  are  merely  bivariate  polynomials  that  are  expressed  conve¬ 
niently  in  terms  of  univariate  basis  functions; 

n— I  Tu— I 

S  I]  =  S(«,v). 

»sO  jsO 
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When  we  re-express  this  as  a  linear  combination  of  control  vertices 
^  bivariate  buis  functions 

=  S(«,t;)  (4) 

»ii 


where 

B,„(u,  v)  =  5,(w)  Bj(v) 

we  can  apply  the  methods  described  in  Section 
In  keeping  with  the  notation  of  Section  2.1  we  use  matrix  nota¬ 
tion  for  our  differential  constraints,  A  system  of  constraints  at  some 
parametric  point  (u,  v)  will  then  be  represented  as; 

B(u,?)AV’f  =  AS^(u,ii)  (5) 

where  AV  is  now  indexed  as  a  vector,  rather  than  a  matrix.  At 
(tt,  v)  we  may  now  constrain  any  subset  of  partial  derivatives  by 
selecting  appropriate  values  for  the  right  hand  side.  By  applying 
changes  to  appropriate  partial  derivatives  in  a  controlled  manner, 
we  can  obtain  direct  control  of  the  surface  geometry  at  ( u,  v). 

An  a.osmate  solution  method  can  be  applied  in  certain  instances. 
Recalling  the  matrix  representation  of  a  tensor  product  surface 

Bo(tt)VBo(t.)’’  =  S(u,t.)  (6) 

we  can  express  a  set  of  changes  to  the  derivatives  of  the  surface  as 
follows 


Bo  1 

r  Bo  1 

T 

ASo  ASy 

B. 

AV 

B. 

= 

AS«  AS«« 

1 

•  *  • 

•  •  *  • 

where  we  have  omitted  the  («,  v)  for  brevity.  When  a  chosen  sub¬ 
matrix  of  the  right  hand  side  of  (7)  is  fully  specified  we  can  apply 
the  curve  solution  to  the  rows  and  columns  of  A  V  to  yield  the  so¬ 
lution 

AV  =  B(«r'  AS(tt,v)(B(vr‘f .  (8) 

Here  we  precompute  Bfu)"*  and  B(v)~*.  The  cost  of  applying 
these  two  matrices  to  each  AS  is  no  greater  than  the  cost  of  t^)- 
plying  the  single  matrix  that  would  result  using  the  bivariate  vector 
solution  (S).  The  savings  in  (8)  is  in  computing  the  right  inverses  of 
two  basis  matrices  with  low  row  dimension  (one  or  two  in  the  cases 
to  be  discussed),  rather  than  one  matrix  whose  row  dimension  is  the 
product  of  the  row  dimensions  of  these  two  matrices.  When  partial 
derivatives  in  the  right  hand  side  submatrix  are  not  specie^,  i.e. 
are  left  free  to  vary,  we  use  the  vector  solution  (3). 

Basedon  pur  experience  with  curves,  we  generally  choose  not  to 
incoiporate  all  control  vertices  defining  the  patch  containing  (u,v). 
Instead,  we  use  the  control  vertex  of  maximal  influence  and  an 
additional  degree  of  freedom  in  each  parametric  direction.  This 
amounts  to  introducing  zeroes  into  the  least  significant  columns  of 
B(u)  and  B(v).  The  resulting  control  vertices  that  are  affected  typi¬ 
cally  belong  to  a  rectangular  subset  of  the  control  point  mesh.  This 
has  produced  goodresults  for  most  systems  of  constraints,  although 
in  some  cases  the  area  of  effect  can  be  further  reduced.  In  order  to 
involve  a  non-rectangular  subset  of  control  vertices,  however,  the 
vector  solution  (3)  must  be  used. 

3  Geometric  Constraints 


might  interact  graphically  with  the  surface  to  control  these  proper¬ 
ties  are  deferred  to  Section  4.  For  this  paper  we  restrict  our  discus¬ 
sion  to  geometric  properties  that  can  be  specified  in  terms  of  of  the 
first  derivatives  of  the  basis  functions 


Bo(u) 

AV 

■  Bo(v)  ■ 

T 

ASo(S,v) 

AS,(«,») 

B.(u) 

B.(ti) 

AS»(u,ti) 

AS,,(5,v) 

which  actually  includes  the  mixed  partial  (a  second  derivative) 
commonly  referred  to  is  the  "twist"  vector.  We  assume  that  we 
are  dealing  with  a  regular  surface  and  that  the  derivatives  we  are 
interested  in  (not  necessarily  all  of  the  above)  do  not  vanish  at  the 
parametric  points  of  interest.  The  examples  in  all  illustrations  are 
of  bicubic  non-uniform  B-spline  surfaces. 

3.1  Position 

Controlling  the  position  of  a  point  on  the  surface  is  a  straight¬ 
forward  process.  If  a  desired  point  is  to  be  interpolated,  a  designer 
can  select  a  nearby  point  on  the  surfa.?*  (for  which  we  determine 
(u,  v))  and  relocate  the  point  to  the  target  location.  This  is  done 
by  applying  a  change  in  position  to  the  surface  at  (u,  v)  using  the 
single  differential  constr^nt 

[  Bo(5)  ]  AV  [  Bo(v)  ]’'  =  [  ASo(u,v)  ] . 

Figures  1  and  2  illustrate  examples  of  positional  disolacement 


Figure  1:  Positional  displacement  of  a  point  near  the  centr  of 
flat  sheet.  One  control  vertex  is  involved,  hence  the  asymmetry. 


incorporating  varying  degrees  of  freedom.  The  gross  asymmetry 
in  the  former  is  absorbed  by  the  inclusion  of  an  extra  degree  of 
freedom  in  each  parameter.  This  extra  degree  of  freedom  involves 
four  control  vertices  and  has  produced  good  results  in  our  cunent 
B-splinc  modeller. 

3.2  Tangent  Plane  Orientation 

Controlling  tangency  for  surfaces  is  a  much  less  well-defined 
task  than  was  the  case  for  curves.  For  a  regular  parametric  surface, 
the  equation  for  the  normal  vector  is  given  by 

- -  _  K(m,v)  X  S,Cu,t?) 

II  Su(u,v)  X  S,(it,v)  II 


Here  we  describe  how  appropriate  systems  of  differential  con¬ 
straints  are  formulated  to  achieve  certain  geometric  properties,  i.e. 
how  to  construct  geometric  constraints.  The  details  of  how  one 
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Figure  2:  Position*!  displicementas  in  Figure  1  using  an  extra 
degree  of  freedom  in  each  parametric  direction.  The  previous 
asymmetry  is  absorbed. 


which  in  turn  defines  the  tangent  plane  at  (tZ,  v).  To  control  the  tan¬ 
gent  plane  at  (u,  ti),  while  preserving  S(tt,  v),  we  use  the  following 
system  of  differential  constraints 


Bo(u,u) 

0 

B«Cu,v) 

AV’‘  = 

AS,(u,v) 

B,^,v) 

AS,^,v) 

and  apply  coordinated  changes  to  AS  «  and  AS, , 

If  we  wish  to  change  the  normal  vector  (and  hence  the  tangent 
plane),  we  can  do  so  by  rotating  the  entire  frame  defined  by  N, 
S«(u,  v)  and  S,(u,v).  Once  a  desired  normal  is  obtained,  both 
S«(ui  v)  and  S,(u,  v)  can  be  further  manipulated  within  the  tan¬ 
gent  plane  for  ^ditional  shaping  freedom.  When  rotated  about  the 
normal  vector,  a  twisting  effect  results  without  affecting  tangency. 
In  cases  where  one  partial  is  to  be  manipulated  while  the  other  is 
fixed,  another  of  the  right  hand  side  entries  will  become  zero. 


Figure  3:  Nonnal  orientation  by  loiaung  the  frame  about  one  of 
the  coordinate  axis. 


Figures  3  through  S  illustrates  three  examples  of  tangent  plane 


orientation  by  rotation  of  the  partial  derivative  frame.  Adding  an 
extra  degree  of  freedom  here  incorporates  nine  control  vertices, 
which  distributes  the  effects  well,  but  over  a  potentially  signifi¬ 
cant  area  of  the  surface.  Using  bicubic  B-splines,  this  results  in  36 
patches  affected  by  the  change.  In  cases  where  one  or  both  partials 
do  not  change  significantly,  a  degree  of  freedom  can  be  dropped  in 
one  or  both  directions  reducing  the  area  of  effect  without  introduc¬ 
ing  any  "unbalanced”  effects. 


Figure  4;  Alignment  of  the  tangent  plane  with  a  venical  plane. 


Figure  5;  The  Twisting"  effea  produced  by  rotation  of  the  par¬ 
tial  derivative  frame  about  the  normal. 


3.3  Tension 

In  the  case  of  curves,  changing  the  magnitude  cf  the  first  deriva¬ 
tive  produced  a  “tension-like"  effect  (a  marked  change  in  curvature) 
similar  to  that  produced  by  changing  the  tension  parameter  of  /?- 
splines  or  the  weights  of  a  ration^  curve.  An  analogous  effect  can 
be  created  by  changing  the  magnitudes  of  the  partial  derivatives  at 
a  point  on  a  surface.  We  apply  two  styles  of  this  tension-like  effect 
while  preserving  the  onentation  of  the  tangent  plane.  Both  styles 
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involve  changes  to  (in  general)  both  partial  derivatives,  thus  we  use 
the  same  set  of  differential  constraints  in  (9). 

We  refer  to  un^orm  tension  as  the  effect  created  by  scaling  both 
partial  derivatives  uniformly.  Figures  6  and  7  illustrate  uniform 
tension  applied  while  preserving  position  and  tangent  plane. 


Figure  <;  Unifocm  teniion  applied  to  the  example  in  Figure  2. 
Magnitudes  of  the  partial  derivatives  ate  decreased. 


Figure  7:  Uniform  tension  applied  to  the  example  in  Figure  2. 
Magnitudes  of  the  partial  derivatives  are  increased. 


We  could  also  scale  the  partial  derivatives  individually,  but  this 
restricts  us  to  two  directions  of  effect  -  directions  which  are  para¬ 
metrically  dependent.  Instead,  we  allow  application  of  tension  in 
a  user-specified  direction  in  the  tangent  plane,  hence  the  term  di¬ 
rection^  tension.  This  directional  tension  is  achieved  by  defining 
an  axis  in  the  tangent  plane  (through  the  origin)  and  scaling  the 
perpendicular  components  of  the  partial  derivatives  relative  to  this 
axis.  Figures  8  and  9  give  two  examples  of  directional  tension  ap¬ 
plied  along  varying  axes  oblique  to  the  partial  derivatives. 

We  apply  directional  tension  by  mapping  the  tangent  plane  to  the 
XY-plane  and  aligning  the  axis  of  interest  with  the  X  axis.  We  then 
apply  the  resulting  transformation  to  the  partial  derivatives,  scale 


their  resulting  Y  component,  then  apply  the  inverse  transformation. 
This  works  well  for  the  vastmajority  of  cases.  When  extreme  direc¬ 
tional  tension  is  applied  in  one  direction  and  then  a  second  direction 
is  chosen,  the  effect  of  the  second  deformation  is  sometimes  not  as 
marked  as  expected.  This  can  often  be  remedied  by  undoing  the 
effects  of  the  previous  deformation  by  rotating  the  partials  so  that 
they  “straddle"  the  deformation  axis  prior  to  performing  the  scaling 
uaiisformation. 


Figure  8:  DiKctional  tension  applied  to  example  in  Figure  7 
along  a  parametrically  independent  uis. 


Figure  9;  Directional  tension  ap{died  in  the  direction  of  one  of 
the  partial  derivatives. 


3.4  Other  Parametric  Constraints 

There  are  other  sets  of  differential  constraints  that  may  be  of  in¬ 
terest,  but  that  we  choose  not  to  include  as  geometric  constramts,  as 
their  effect  on  the  surface  in  not  necessanly  geometrically  mtuitive. 
We  include  them  here  as  they  may  be  useful  in  certain  appliuaiiuns. 
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3.4.1  Direction  Vectors 


4.1  Two-dimensional  Input 


Although  useful  for  cm  ves,  we  have  not  found  manipulation  of 
a  single  partial  derivative  at  (u,  v)  particularly  useful  for  surfaces. 
For  curves  the  first  derivative  has  a  close  association  to  the  unit  tan¬ 
gent,  so  the  parametric  and  geometric  properties  are  closely  linked. 
For  surfaces,  the  directions  of  parametric  derivatives  have  a  lesser 
significance  when  considering  the  entire  surface  about  a  particular 
point, 

A  partial  derivative  may  be  set  simply  by  applying  the  required 
change  to  the  existing  derivative.  For  example,  in  order  to  alter  the 
tangency  of  the  curve  through  (¥,  v)  traveling  in  the  u  or  v  direction, 
systems  of  the  form 


Bo(5) 

B,(«) 


AV  [  Bo(v) 


0 

AS«(u,  v) 


Bo(u)  ]  AV 


Bo(ti) 

B.(iJ) 


0  AS«(u,v)  j 


should  be  solved.  If  we  incorporate  the  second  derivative  in  the 
same  parametric  direction,  we  can  control  curvature  in  that  direc¬ 
tion,  as  described  in  [10]. 


3.4.2  The  "Twist"  Vector 


The  mixed  partial,  S,,(u,  v),  is  often  refened  to  as  the  “twist*' 
vector  and  has  a  long  history  in  the  construction  of  composite  sur¬ 
faces  [8],  If  desired,  the  twist  vector  can  be  manipulated  while  the 
normal  to  the  surface  is  left  free  to  vary  using  the  following: 

Boftl.ti)  .  vT  _  0 

B„(a,v)J"'^  ■  [  AS„(tr,v)  J  * 

The  twist  vector  may  also  provide  an  added  shaping  handle  white 
the  tangent  plane  is  fixed  at  a  point.  This  is  achieved  by  augmenting 
(9)  with  a  differential  constraint  for  the  twist  vector  (which  then 
permits  us  to  use  the  more  efficient  matrix  solution)  while  setting 
the  change  to  the  other  partials  to  zero; 


■  Bo(u)  ' 

AV 

Bo(v) 

T 

'  0  0 

B,(u) 

B,(v) 

0  AS«,(u,v) 

The  twist  vector  can  then  be  arbitrarily  rotated  or  scaled  to  adjust  the 
surface.  The  partials  may  also  be  manipulated  within  the  tangent 
plane  while  changing  the  twist  vector.  We  have  not  yet  found  any 
geometrically  intuitive  methods  for  controlling  the  twist  vector  at 
this  point. 


4  Interactive  Issues 

Now  that  we  can  express  geometric  properties  in  terms  of  dif¬ 
ferential  constraints,  we  require  visual  and  interactive  mechanisms 
to  present  and  manipulate  these  properties.  This  section  describes 
numerous  methods  that  we  have  tried  with  varying  levels  of  suc¬ 
cess.  This  work  is  still  in  progress  (particularly  three-dimensional 
input)  and  we  will  no  doubt  come  anoss  more  possibilities. 

One  aspect  that  needs  to  be  addressed  before  discussing  input  de¬ 
vices  is  the  visual  representation  of  the  surface.  Our  favoured  dis¬ 
play  representation  is  a  shaded  tessclated  surface.  Unfortunately, 
few  machines  are  capable  of  displaying  complex  shaded  surfaces 
at  interactive  rates.  For  such  machines  a  wireframe  rendering  is 
usually  all  that  can  be  supported  at  such  a  rale.  Aside  from  resolv¬ 
ing  the  ambiguity  present  in  wireframe  models,  a  shaded  represen¬ 
tation  gives  a  visual  representation  of  the  surface  over  the  entire 
parametric  range.  This  is  in  contrast  to  renderings  of  isoparametric 
curves  where  the  gaps  present  in  the  display  make  selection  awk¬ 
ward.  The  combination  of  direct  manipulation  methods  with  an 
interactive  shaded  display  has  been  extremely  effective. 


Two-dimensional  input  devices  will  generally  be  less  cxpo.isive 
to  manufacture  than  their  three-dimensional  counterparts.  As  a  re¬ 
sult,  we  must  consider  reasonable  methods  of  controlling  surface 
geometry  with  the  ever-present  mouseif  our  techniques  are  to  be  us¬ 
able  on  conventional  workstations  and  personal  computers.  There 
are  a  variety  of  effective  two-dimensional  input  methods  and  in¬ 
teractive  techniques  that  are  useful,  notably  [2]  and  [4].  Our  first 
application  of  geometric  constraints  was  naturally  tested  and  de¬ 
bugged  using  a  simple  mouse  for  input. 

Our  first  issue  to  be  resolved  was  that  of  selecting  the  point  on 
the  surface.  What  was  originally  intended  as  a  “quick  hack”  to  get 
something  working  for  demonstration  purposes  has  turned  out  to 
be  much  more  useful  than  expected.  This  method  was  to  select 
the  parametric  point  on  the  surface  by  moping  screen  space  to  the 
parametric  domain  of  the  surface.  As  the  user  moves  the  mouse 
on  the  screen,  while  pressing  a  particular  mouse  button,  a  position 
marker  is  moved  across  the  surface.  This  guarantees  that  a  valid 
parametric  point  is  always  available.  When  just  the  position  marker 
for  the  point  was  displayed,  this  method  was  not  visually  interesting 
and  the  prescnceof  poor  mapping  when  the  orientation  of  the  object 
changed  was  prevalent.  However,  once  we  displayed  the  normal 
vector  and  tangent  plane  (either  a  wire-frame  mesh  or  a  transpar¬ 
ent  polygon),  this  method  turned  into  a  useful  evaluation  tool.  The 
undulating  tangent  plane  gave  a  good  feel  for  the  curvature  of  the 
surface,  and  sometimes  barely  visible  changes  were  made  obvious. 
Prior  awkwardness  felt  in  the  absence  of  the  tangent  plane  was  tol¬ 
erated  and  virtually  ignored  as  the  visual  feedback  became  much 
more  valuable. 

In  order  to  experiment  with  the  effects  of  ^plying  the  numer¬ 
ous  systems  of  differential  constrair's  described,  a  series  of  inter¬ 
action  panels  were  created,  each  with  a  variety  of  buttons,  valuators 
and  positioners.  These  were  created  for  us  to  explore  the  various 
constraint  systems  described,  rather  than  as  intuitive  tools  for  a  de¬ 
signer.  Since  our  primary  interest  is  in  three-dimensional  input,  we 
did  not  go  to  great  effort  to  pursuing  any  radical  new  meth(^,  but 
this  experimentation  did  give  us  the  opportunity  to  compare  a  few 
ideas. 

Functionality  was  grouped  into  four  panels  that  controlled  posi¬ 
tion,  normal  orientation,  partial  derivative  manipulation  in  the  tan¬ 
gent  plane,  and  the  twist  vector.  Much  of  the  functionality  of  these 
panels  involved  simple  scaling  and  rotation  of  vectors,  which  could 
be  performed  in  numerous  ways.  Separate  sliders  for  rotation  about 
the  three  coordinate  axes  proved  awkward  (as  expected)  for  arbi¬ 
trary  vector  orientation.  A  separate  2D  positioner  for  azimuth  and 
incUnalion  was  more  successful  when  the  extremities  of  the  panel 
were  avoided.  Chen  et  al's  “virtual  sphere"  [4]  provided  a  much 
more  intuitive  feel  than  both. 

The  position  panel  contained  several  functions.  Displacement 
could  be  applied  along  any  of  the  coordinate  axes,  along  the  normal 
or  partial  derivative  vectors,  or  along  a  vector  whose  direction  could 
be  set  arbitrarily.  The  tangent  plane  could  also  be  either  fixed  or  free 
to  vary  during  the  displacement.  The  normal  and  twist  orientation 
panels  provided  simple  orientation  of  their  representative  vector. 
The  panel  of  most  interest  to  us  was  that  which  provided  control 
of  the  partial  derivatives  in  the  cunent  tangent  plane,  and  hence 
controlled  uniform  and  directional  tension.  This  panel  is  illustrated 
in  Figure  10.'  Again,  although  it  is  not  suggested  as  tool  for  a 
designer,  it  did  provided  us  with  flexible  control  of  the  parameters 
at  our  disposal. 

A  more  intuitive  panel  for  tension  control  was  designed  to 
presents  a  projection  of  the  tangent  plane  onto  a  small  window, 
along  with  a  few  additional  controls,  as  illustrated  in  Figure  1 1 .  The 
window  contains  a  vector  for  the  application  of  directional  tension 
along  its  orthogonal  counterpart.  These  vectors  may  be  selected 
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Figure  10;  Illustration  of  a  panel  used  to  experiment  with  ma< 
nipulation  of  the  partial  derivatives  in  the  tangent  plane  -  not 
suggested  as  geometrically  intuitive. 


I  < 

and  scaled  individually,  or  coupled  to  s^ply  uniform  tension.  They 
may  also  be  rotated  in  two  modes:  one  which  simply  re-orients  the 
axes  for  directional  tension,  another  which  actually  results  in  tlH*.  ro¬ 
tation  of  the  partial  derivatives  to  achieve  the  twisting  effect  previ¬ 
ously  descril^.  The  vectors  are  simultaneously  displayed  with  the 
tangent  plane  on  the  three-dimensional  view  of  the  surface.  When 


selecting  a  vector  with  the  left  mouse  button,  subsequentmuvement 
of  the  mouse  was  used  to  apply  tension  (set  to  either  directional  or 
uniform).  The  scale  factor  was  measured  proportional  to  the  dis 
tance  of  the  mouse  to  the  centre  of  the  wmdow.  Similarly,  selection 
and  movement  with  the  right  mouse  button  would  applied  rotation 
(set  to  either  orient  the  axes  or  actually  twist  the  surface)  to  these 
axes.  We  will  soon  be  using  this  window  to  actually  view  the  sur 
face  beneath  the  tangent  plane  as  these  operations  are  performed. 


4.2  Spatial  Input 

Devices  offering  spatial  input  have  been  available  for  years  now, 
but  have  failed  to  become  widely  accepted.  This  may  be  due  to  a 
combination  of  both  high  cost  and  poor  utilization  of  the  technol¬ 
ogy. 

One  such  device,  the  Polhemus  3Space  IsoTrak,  is  “a  full  six 
deggree-of-freedom”  device,  providing  information  on  both  posi¬ 
tion  and  orientation.  The  IsoTrak  is  a  magnetic  held  device  consist¬ 
ing  of  a  fixed  “source"  and  a  movable  "sensor."  The  sensor  has  a 
working  volume  consisting  of  a  30  inch  hemisphere.  Accuracy  de¬ 
grades  significantly  outside  this  range,  or  in  the  presence  of  other 
magnetic  devices,  e.g.  workstation  monitors. 

A  three-dimensional  locator  allows  the  space  in  which  a  surface 
is  defined  to  be  mapped  into  the  space  about  the  user’s  hand.  The 
IsoTrak  can  be  hand-held,  is  also  available  as  part  of  a  digitizing 
stylus,  and  also  may  be  mounted  on  the  back  of  a  VPL  DataGlove. 
This  can  permit  a  designer  to  move  a  3D  cursor  freely  in  space  and 
thus  approach  the  surface  firom  either  side.  An  obvious  method  for 
selecting  a  point  on  the  surface  is  to  detect  intersection  of  the  cursor 
with  the  surface.  We  have  not  yet  implemented  such  selection,  and 
still  rely  on  selection  by  the  surface  scanning  method  previously 
desaibed.  Rather  than  use  the  mouse,  though,  we  use  the  IsolYak 
as  a  2D  device,  i.e.  a  tablet,  and  mq>  the  table  top  of  the  user’s 
work  space  to  the  parametric  domain  of  the  surface.  The  height 
and  orientation  dau  is  ignored.  Since  there  is  no  button  of  any  kind 
on  the  IsoTrak,  any  character  on  the  keyboard  is  used  to  indicate 
selection. 

Once  selected,  the  position  of  the  surface  is  naturally  displaced 
by  mapping  the  coordinate  system  of  the  IsoTrak  to  correspond  with 
the  current  view  (or  a  selected  view,  if  more  than  one  are  present) 
of  the  surface  (thus  left-to-right  movement  of  the  hand  correiponds 
to  left-to-right  displacement  of  the  surface).  The  change  to  the  ori¬ 
entation  vector  m^s  naturally  to  a  change  in  normal  vector.  Note 
that  if  the  user  intends  to  displace  the  surface  downward  in  the  cur¬ 
rent  view,  selection  must  be  done  while  the  hand  is  at  a  sufficient 
height  above  the  table  top  to  allow  specification  of  the  desired  dis¬ 
placement.  The  user’s  work  space  may  also  be  mapped  to  the  local 
coordinate  system  of  a  desir^  surface  if  desired.  The  advantage 
here  is  that  the  changes  are  relative  to  the  surface.  The  IsoTrak  can 
thus  be  laid  to  rest  on  a  table  top,  oriented  at  an  angle  that  allows 
most  comfortable  application  of  the  desired  change. 

We  control  the  twisting  of  the  partial  dmvatives  about  the  tan¬ 
gent  plane  is  controlled  by  simply  rotating  the  IsoTrak  while  main¬ 
taining  its  “up-vector.”  We  do  not  provide  any  control  of  tension  in 
this  “mode"  of  operation  as  yet. 

This  relative  method  has  proven  to  be  much  easier  than  main¬ 
taining  absolute  position  and  orientation  at  the  selected  point,  as 
originally  attempted.  The  freedom  to  rest  and  rotate  the  hand  com¬ 
fortably  is  much  preferable  to  reaching  out  into  space  with  Uie  hand 
at  awkward  angles.  Due  to  tlie  limited  accuracy  of  the  IsoTrak,  it  is 
best  to  map  only  a  local  area  about  the  point  of  deformation  into  the 
sensor’s  work  space  so  that  noise  does  not  cause  large  unexpected 
perturbances  of  tlie  surface. 

4.3  A  “Virtual  Hand” 

Our  primary  goal  is  to  develop  a  direct  manipulation  sculpting 
environment  utilizing  three  dimensional  display  and  a  pair  of  “vir¬ 
tual  hands.”  We  are  currently  using  one  of  a  pair  of  VPL  Data- 
Glovcs  with  our  direct  surface  manipulation  methods.  Each  Data 
Glove  consists  of  a  thin  glove  mounted  with  a  3Space  IsoTrak  and 
a  set  of  optical  fibers  (refened  to  as  flex  sensors)  measuring  two 
joint  angles  for  each  finger  and  the  thumb.  The  angles  of  abduc¬ 
tion  between  the  fingers  (and  the  thumb)  are  not  measured  by  the 
DataGlove. 
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Although  the  Dataglovc.i  are  still  a  long  way  from  providing  the 
full  flexibility  of  the  human  hand,  the  joint  sensors  have  provided 
us  with  additional  shaping  methods  to  those  described  for  the  Iso- 
llrak.  The  vast  majority  of  work  with  these  gloves,  including  [13], 
is  through  the  use  of  gestures.  We  prefer  to  use  the  right  hand  purely 
as  a  shaping  tool  -  free  of  the  burden  of  making  gestures  -  and  use 
the  key’ooard  (eventually  gestures  from  a  glove  on  the  left  hand)  to 
initiate  and  terminate  actions.  Voice  recognition,  as  used  in  [13], 
would  be  even  more  preferable  mechanism,  freeing  both  hands  for 
shaping. 

Since  the  DataGlove  is  mounted  with  an  IsolVak,  we  inherit  the 
functionality  described  in  Section  4.2.  What  we  lacked  with  the 
IsolVak  was  a  means  of  applying  uniform  and  directional  tension. 
We  can  employ  the  flex  sensors  of  the  DataGlove  for  this  purpose. 
We  use  the  flex  sensOTS  of  the  four  fingers  to  apply  tension  along  one 
axis,  and  the  flex  sensors  of  the  thumb  to  apply  tension  along  the 
orthogonal  axis,  A  clenching  of  the  fist  flexes  all  joints  simultane¬ 
ously  and  thus  results  in  the  application  of  (approximate)  uniform 
tension.  We  eliminate  the  twisting  of  the  surface  about  the  normal, 
and  instead  use  that  degree  of  freedom  from  the  IsoTrak  to  alter 
these  axes  of  directional  tension.  In  practice,  the  thumb  is  rarely 
used  independently  of  the  fingers  -  it  seems  more  natural  to  route 
the  hand  to  re-orient  the  required  axis  along  the  fingers. 

Currently  the  values  of  the  joint  angles  involved  are  simply  av¬ 
eraged  to  determine  the  scale  factors  to  be  applied.  This  helps  to 
smooth  out  some  of  noise  present  in  the  flex  sensors,  which  at  times 
canbeextrerne.,  Theirmersensorscontrol  agreaterrateof“squeez- 
ing**  while  we  attempt  to  gain  finer  control  from  the  flex  sensors  of 
the  outer  joints,  although  the  noise  and  non-linearity  of  these  sen¬ 
sors  makes  this  difficult  to  achieve. 

The  keyboard  must  currently  be  used  to  control  modes  of  opera¬ 
tion.  When  the  hand  is  fully  clenched  and  more  tension  is  desired, 
a  “clutch"  must  be  used  to  release  the  hand  from  the  surface  so  that 
it  can  be  reclerA  ned  to  q>ply  further  tension.  We  suggest  that  the 
hand  be  loosely  clenched  when  selection  takes  place  so  that  tension 
can  be  both  increased  and  decreased  by  moderate  amounts. 

This  technique  at  times  shows  great  promise,  but  at  other  times 
poor  calibration  of  the  glove  and  the  instability  of  the  flex  sensors 
make  it  difficult  to  gain  fine  control  of  the  tension.  We  feel  that  the 
approach  is  sound  and  that  given  a  more  accurate  and  stable  device 
will  be  very  useful.  Noise  is  present  in  both  the  IsoTrak  and  the 
flex  sensors  of  the  DataGlove.  The  noise  in  the  flex  sensors  makes 
accurate  “moulding”  virtually  impossible 

5  Future  Work 

The  cunent  direct  manipulation  B-spline  modeler  runs  on  an 
SGI  IRIS  4D340/VGX.  We  are  capable  of  direct  surface  manipula¬ 
tion  with  shadeddisplay  at  reasonable  interactive  rates  (i.e.  several 
frames  per  second).  Surface  position  and  langency  can  be  manipu¬ 
lated  using  a  mouse,  Polhemus3Space IsoTrak,  or  VPL  DataGlove, 
as  describe 

A  couple  of  caveats  may  become  apparent  in  our  description  of 
the  direct  manipulations.  These  manipulations  are  local  in  nature, 
relying  on  the  support  of  the  basis  functions  to  determine  their  area 
of  effect.  As  a  result,  one  of  our  geometric  manipulations  applied 
in  a  highly  refined  region  will  have  liule  effect.  They  are  currently 
only  of  use  for  local  deformations.  Since  our  goal  is  to  rid  the  de¬ 
signers  of  dependencies  upon  the  underlying  representation,  such 
direct  manipulations  must  be  ^plicable  over  a  user-defined  area. 
We  are  currently  implementing  a  method  to  provide  this  regional 
control. 

Another  problem  is  the  asymmetry  that  can  result  when  the  knot 
spacing  is  highly  non-uniform.  Once  a  mechanism  is  in  place  for 
applying  direct  manipulation  over  arbitrary  regions,  we  can  add  ad¬ 
ditional  knots  to  balance  out  the  parametric  spacing,  then  deal  with 


the  added  knots  in  a  well-behaved  manner.  As  part  of  our  regional 
control  scheme,  we  are  extending  the  work  of  Forsey  and  Bartels 

[9]  to  avoid  insertion  of  entire  knot  lines  that  can  lead  to  data  ex¬ 
plosion. 

With  direct  manipulation  and  regional  control,  we  will  have  gone 
a  long  way  in  reducing  the  basis-dependent  attributes  in  the  interac¬ 
tion  with  tensor  product  surfaces.  We  are  also  looking  into  extend¬ 
ing  our  list  of  geometric  constraints  to  provide  intuitive  control  of 
surface  curvature.  We  will  continue  to  pursue  interactive  schemes 
for  “virtual  hands,”  particularly  as  devices  more  sophisticated  than 
the  DataGlove  become  available. 

References 

[1]  Richard  H.  Barteb  and  John  C.  Beatty.  A  Technique  for  the 
Direct  Manipulation  of  Spline  Curves.  In  Graphics  Interface 
'89,  pages  33-39,  London,  Ontario,  1989.  Morgan  Kaufmann 
Publishers,  Palo  Alto,  California, 

[2]  Eric  A.  Bier.  Snap  Dragging:  Interactive  Geometric  Design 
in  INvo  and  Three  Dimensions.  Technical  Report  EDL-89- 
2,  Palo  Alto  Research  Center,  Xerox  Corporation,  Palo  Alto, 
California,  September  1989. 

[3]  George  Celniker  and  Dave  Gossatd.  Deformable  Curve  an  1 
Surface  Finite-Elements  for  Free-form  Shape  Design.  Com¬ 
puter  Graphics  [SIGGRAPH  *91  Conference  Proceedings], 
25(4):257-266, 1991. 

[4]  Michael  Chen,  S.  Joy  Mountford,  and  Abigail  Sellen.  A  Study 
in  Interactive  3-D  Rotation  Using  2-D  Control  Devices.  Com¬ 
puter  Graphics  [SIGGRAPH  *88  Conference  Proceeding.s], 
22(4):12l-129, 1988. 

[5]  J.  H.  Clark.  Designing  Surfaces  in  3D.  Contmunicalions  of 
the  ACM,  19(8):454-460. 1976. 

[6]  Elizabeth  S.  Cobb.  Design  of  SculpluredSurfaces  using  the  B' 
Spline  Bepresenlathn.  PhD  thesis.  Department  of  Computer 
Science,  University  of  Utah,  Salt  Lake  City,  Utah,  1984. 

[7]  Sabine  Coquillart.  Extended  Free-Form  Deformation:  A 
Sculpturinj;  Tbol  for  3D  Geometric  Modeling.  Com¬ 
puter  Graphics  ;SIUGRAPH  '90  Conference  Proceedings], 
24(4):1 87-196, 1990, 

[8]  Gerald  Farin.  Curves  and  Surfaces  for  Cotnputer  Aided  Geo- 
metric  Design.  Academic  Press,  San  Diego,  Califo.mia,  sec¬ 
ond  edition,  1990. 

[9]  David  R.  Forsey  and  Richard  H.  Bartels  Hierarchical  B- 
Spline  Refinement.  Computer  Graphics  [SIGGRAPH  *88 
Conference  Proceedings],  22(4):20S~212, 1988. 

[10]  Barry  Fowler  and  Richard  Bartels.  Constraint  Based  Curve 
Manipulation.  IEEE  Computer  Graphics  and  Applications, 
1992.  Submitted  for  publication. 

[11]  Charles  Loop  and  Tony  DeRose.  Generalized  B-spline  Sur¬ 
faces  of  Arbitrary  Topology.  Computer  Graphics  [SIG¬ 
GRAPH  ’90  Conference  Proceedings).  24(4):347-355, 1990. 

[12]  Thomas  W.  Sederberg  and  Scott  R.  Parry.  Free-form  Defor¬ 
mation  of  Solid  Geometric  Models.  Computer  Graphics  [SIG¬ 
GRAPH  ’86  Conference  Proceedings],  20(4):151-160. 1986. 

[13]  David  Weimer  and  S.  K.  Ganapathy.  A  Synthetic  Visual  En¬ 
vironment  with  Hand  Gesturing  and  Voice  Input.  SIGCHI 
Bulletin  [Human  Factors  in  Computing  Systems  -  CHI  ’89 
Conference  Proceedings],  pages  235-240, 1989. 

[14]  William  Welch,  Andrew  Gleicher,  and  Andrew  Witkin.  Ma¬ 
nipulating  Surfaces  Differentially.  Proceedings,  Compu- 
graphics  '91,  September  1991.  Also  available  from  Carnegie 
Mellon  University  as  Teclmical  Report  CMU-CS  91  -175. 


108 


COMPUTER  INTERACTIVE  SCULPTURE 


Helaman  Ferguson 
Supercomputing  Research  Center 


Central  Purpose 


Mathematical  Design 


As  a  sculptor  I  want  to  experience  and  avail  to  others 
vital  compelling  forms.  I  desire  access  to  quantitative 
measured  forms  as  well  as  qualitative  expression.  Com¬ 
puters  offer  powerful  tool  possibilities.  Other  sculptors 
find  thill  so,  cf.,  [4,  9,  14].  It  is  not  enough  for  me  to 
make  models  of  mathematical  equations  or  CAD  struc¬ 
tures,  although  the  capability  to  do  that  is  sometimes 
im;^ortant.  I  invest  my  sculpture  with  a  wide  range  of 
knowledge.  My  sculpture  process  ter.ds  to  involve  direct 
carving  or  cutting  away  of  material.  It  is  more  fashion¬ 
able  in  sculpture  today  to  do  constructions  or  addition. 
I  prefer  the  more  interesting  and  difficult  subtraction 
processes.  While  I  make  aesthetic  artifacts,  many  of 
our  functional  artifacts  are  made  by  industrial  cutting 
processes  that  are  relevant  to  me. 

As  a  research  mathematician  I  have  had  the  good 
fortune  to  discover  mathematics  as  a  design  language 
for  sculpture,  cf.,  [3, 15).  My  use  of  this  design  language 
folds  naturally  into  our  current  computer  technology. 
Mathematics  is  an  invisible  art  form  of  profound  social 
and  scientific  significance.  Computer  grapliics  makes 
mathematics  visible,  I  take  the  next  step. 

In  this  paper  I  discuss  two  of  my  successful  sculptural 
forms,  Umbilic  Torus  NC  and  Umbilic  Torus  NIST.  I 
have  done  a  series  of  each  using  two  different  kinds  of 
computer  interaction.  These  and  my  related  sculptures 
are  in  permanent  collections,  e.g.,  [10,  11,  12,  13]  and 
have  been  exhibited  widely,  e.g.,  [6,  7,  8]. 
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I  begin  by  plunging  directly  into  the  design  consider¬ 
ations  for  Umbilic  Torus  AC  and  Umbilic  Torus  NIST. 
This  looks  like  raw  mathematics  but  it  is  not  in  the 
usual  sense  because  my  motivations  in  writing  it  down 
are  sculptural,  cf.,  [3,  5].  The  setting  is  the  stratifica¬ 
tion  of  the  space  R”*  of  real  coefficients  of  binary  cubic 
forms  by  the  action  of  the  general  linear  group.  Strat¬ 
ification  means  the  orbits  and  the  relationship  among 
them.  The  correspondence  between  the  points  in  R** 
and  the  cubic  forms  is  given  by 

(o,  t,  c.  d)  6  R"*  ♦-♦  /  =  01'“  bx'y  -f-  cxy^  +  (/j/“ 


The  general  real  linear  group  G  =  GL(2,R)  consists 

of  the  real  invertible  2x2  matrices,  ^  ^  ^  ^  ,  where 

the  condition  of  invertibility  of  this  matrix  is  that  the 
determinant  a6  -  Py  be  non-zero,  These  matrices  will 
be  regarded  as  acting  on  the  two  variable  vector  {x,y) 
as  a  column  vector  by  left  matrix  multiplication  The 
group  action  is  defined  by  the  mapping 


or 

\yj  \lx-i-SyJ 

This  in  turn  gives  an  action  on  cubic  forms  by  substi¬ 
tuting  these  images  in  the  cubic  form  and  multiplying 
out.  Thus  the  form  aP  +  bPy  -f  cxy~  -}-  dtp  becomes 


(fiQ^  +  a'b’i  +  acy'  -r  dy^)x^+ 


(3oq''*/?  -|-  a'bb  -t-  2abp~;  +  'Itxcby  -f  pcy  -t-  'id6~i')x~y-^- 
(3aaP~  -f-  ‘lab, Id  -f  act  '  b  i'y  -f  '23cb~i  ‘idb'y  )i y'  + 
{ti3^  +  b  i'b  -f  led'  T  d6^)y^ 
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These  four  coefficients  ate  linear  in  the  four  original 
coefficients  a,  b,  c,  d  and  define  a  4  x  4  matrix 

/  a®  a®7  0(7®  7®  \ 

3a®/?  a®5  +  2aS-y  +  /?7®  357® 

3a/?®  2a/?5  +  /?®7  a5®  +  2p6y  35®7 
\  /?®  /?®5  /?5®  5®  / 

This  matrix  has  the  determinant  (a5  -  /?7)  which  is 
the  sixth  power  of  the  determinat.t  of  the  original  teal 
invertible  2x2  matrix.  This  new  4x4  matrix  is  invert¬ 
ible  if  and  only  if  the  original  matrix  it  is  representing 
is  invertible.  This  4x4  matrix  is  an  important  example 
of  a  group  representation. 

The  cubic  ox®  +  6a:®y-bciy®  +  di/®  can  be  completely 
factored  over  the  complex  numbers  C, 

ax®  -{•  6x®y  +  cxy®  +  = 

(rix  +  siy)(r2X  S2j/)(r3X  +  say), 

>r  >'i,si,»’2,S2,r3,S3  6  C.  This  gives  three  ratios  or 
lines  given  by  the  pairs  rj ,  s/ ,  j  =  1,2, 3.  We  can  think  of 
them  as  lines  because  non-zero  scaling  of  the  form  corre¬ 
sponds  to  non-zero  scaling  of  the  pairs.  The  ratios  cor¬ 
respond  to  roots  of  tl'.e  cubic  and  can  be  classified  into 
five  types;  hyperbolic  umbilics,  two  complex,  one  real 
root,  e.g.,  X®  -1- 1/®;  elliptic  umbilics,  three  real  distinct 
roots,  e.g.,  X®  -  3xj/®;  parabolic  umbilics,  three  real, 
two  equal  roots,  e.g.,  x®y;  exceptionals,  three  real  equal 
roots,  e.g.,  X®,  and  the  origin.  These  root  types  corre¬ 
spond  to  orbits  of  the  group  of  dimensions  4, 4, o,  2,0 
respectively. 

The  discriminant  of  the  cubic  is  an  invariant  under 
the  linear  changes  of  variable  we  have  been  considering. 
The  cubic  discriminant  is 

(-  (6®c®)  -f  4ac®  45® (i  -  ISabcd  27a®d®) . 

The  parabolic  umbilics  are  those  (n,5,  c,  d)  such  that 

-  (5®c®)  -I-  4nc®  +  45®(/  —  18a5cd  -{•  27a®d®  =  0. 

The  ‘hyperbolic  umbilics  at  infinity’  and  ‘elliptic  unibil- 
ics  at  infinity’  are  those  (a,5,c,  d)  such  that 

-  (5®c®)  +  4ac®  -f  45®d  —  ISabcd  +  27a‘d'  <  0 

-  (5®c®)  +  ‘lac®  +  45®d  -  ISabcd  -f  27a' d'  >  0 

respectively  The  root  types  and  orbits  amount  to  the 
same  things.  The  discriminant  is  homogeneous  in  the 
four  variables  so  it  suffices  to  look  at  orbits  of  forms 
represented  by  ])oints  on  the  3-sphere, 

{[a  5.  c,  d]  j  a"  -f  h'  -}-  c'  -f  d"  =  1}. 


This  reduces  the  dimensions  above  to  3, 3, 2, 1,  ignoring 
the  origin.  The  situation  is  now  in  a  sculpture  appro¬ 
priate  space. 

In  the  complex  n’.imber  representation  for  the  real 
cubic  forms;  the  four  real  coefficients  can  be  replaced 
by  two  complex  numbers.  Consider  the  real  part  of  the 
complex  cubic  form 

(ti2®  -b  vz^z*) , 

where  z  =  x+j/i,  the  complex  conjugate  z*  =  x-yi,  and 
u  =  ui  -b  U2i,  V  =  vj  -b  ^21.  The  linear  transformation 
relating  the  a,b,c,d  and  the  u,  v  coefficient  sets 

a\  /I  010 

5  j  _  I  0  -3  0-1 

c  “  -3  0  1  0 

d/  Vo  10-1 

has  determinant  16  and  inverse 

/I  0  -1  0  \ 

1  0-10  1 
4'  3  0  1  O’ 

Vo  -1  0  -3/ 

There  are  two  interesting  planes  of  forms  here,  «  -  0 
or  the  v-plane  or  the  S)J(t)j®r*)  form  and  v  =  0  or 
the  ti-plane  or  the  9t(«x®)  form.  They  are  interesting 
because  the  group  contains  the  rotations  e**  which 
acts  on  each  of  these  forms  and  corresponding  planes  in 
a  simple  way.  At  least  it  looks  simple  when  written  as 
complex  multiplication  instead  of  matrix  multiplication. 
Here  is  the  action  of  this  circle  group  on  the  form  as  it 
acts  by  complex  number  multiplication 

e'^  :  z  I-*  e^^z, 

to  give 

e*"  :  Df  («.’®  +  u;®x‘)  h, 

DUae®'".’® -b  . 

Geometrically  this  means 

(«,  u)  t—  ee*^), 

or  that  It  gets  rotated  thrice  whilst  t  is  rotated  once. 
We  will  use  this  observation  below  twice. 

Consider  the  plane  a  =  1,  the  unit  translate  of  the 
u-plane,  or  the  real  cubic  forms  9f(c®-b  vz'z*).  The 
question  is  how  does  the  discriminant  variety  inter¬ 
sect  this  plane  or  these  foinis  The  discriminant  \a- 
riely  consists  of  those  forms  having  double  roots  at 
least.  For  which  I  's  will  there  be  double  roots?  Since 
3?  (x® -b  is  hoiiiogeiieotis  ill  c  and  ;  =  0  is  ac¬ 

counted  foi,  we  iiia\  su|)pu.se  that  the  roots  have  abso¬ 
lute  value  one.  |c|  =  1.  or  that  ;  =  c*"  In  this  case 

'F(c®-b.c®:-)  = 


110 


For  what  v’s  does  this  form  have  a  double  root  and 
hence  be  on  the  discriminant  variety?  This  is  the  same 
form  after  multiplying  by  to  get 

The  derivative  of  this  form  is  supposed  to  vanish  since 
we  want  v’s  giving  a  double  root,  so,  after  solving  for  v, 
we  have 

u  =  -2e2‘«  +  e-‘“^ 

A  more  recognizable  version  of  this  equation  is  had  by 
rewriting  the  variable  0  = 

V  =  2e*^  <-  Q  <  <j)  <2ir. 

This  is  the  locus  of  a  point  on  a  circle  of  radius  I  rolling 
inside  a  circle  of  radius  3,  otherwise  known  as  a  hypocy- 
cloid  of  three  cusps.  This  includes  tiie  case  of  all  tliree 
roots  of  the  cubic  form  being  identical.  In  this  case  the 
second  derivative  of  the  form  vanishes, 

a<p 

which  when  set  equal  to  zero  gives 
=  1  = 
and 

v=  3e^,lL-6Z, 

which  latter  is  the  set  of  the  three  cube  roots  of  unity 
scaled  by  three. 

These  scaled  cube  roots  of  unity  are  cusps  of  the 
curve  because  a  tangent  line  is  defined  there  as  every¬ 
where  else  on  the  curve,  but  there  is  no  tangent  circle 
defined  there  but  is  everywhere  else. 


Figure  1.  llypocy  cloid  of  I'liree  C'usps 


Cycloids  in  general  are  defined  in  terms  of  a  circle 
of  radius  B  rolling  without  slippage  inside  or  outside 
a  circle  of  radius  A.  The  equation  in  complex  variable 
form  is 

i  =  (A -f 

where  B  >  0  gives  an  epicycloid  (the  smaller  circle 
rolling  outside  the  larger  circle)  and  B  <  0  gives  a 
hypocycloid  (the  smaller  circle  rolling  inside  the  larger 
circle).  If  B  =  0  we  just  get  the  point  2  =  0.  If 
(A,B)  =  (3,-1)  we  get  the  hypocycloid  of  three  cusps 
above.  If  (A,  B)  =  (1,1)  we  get  the  cardioid  which  we 
shall  see  below. 

We  have  chosen  to  look  at  those  of  the  form 
Si  (z®  -F  vr^r*).  This  says  that  the  form  (u,  v)  =  (1,0) 
or  (o,6,c,d)  =  (1,0, -3,0)  is  included  which  has  neg¬ 
ative  discriminant  -108  and  the  form  has  three  dis¬ 
tinct  real  roots  and  is  therefore  an  elliptic  umbilic.  The 
hypocycloid  we  have  discovered  lies  in  the  plane  given 
by  u  =  1  or  the  v-plane.  This  plane  gets  moved  by 
the  unit  circle  group.  Recall  the  u  thrice  while  v  once. 
Otherwise  said,  one  third  rotation  of  v  while  u  rotates 
once.  This  means  that  the  hypocycloid  rotates  one  cusp 
over  while  u  =  1  moves  to  u  =  .  The  hypocycloid 

has  three  fold  symmetry,  so  the  resulting  surface  closes. 
We  are  looking  at  this  torus  with  a  hypocycloid  cross- 
section  from  the  point  of  view  of  the  (e-*^,  v)  geometry. 
This  torus  does  indeed  present  us  with  a  picture  of  the 
parabolic  umbilic  surface,  but  keep  in  mind  that  we  are 
looking  at  this  singular  set  from  a  chosen  perspective, 
one  where  the  elliptic  umbilic  point  (u,u)  =  (1,0)  and 
all  other  elliptic  umbilic  points  are  inside  the  bounded 
part  of  the  hypocycloid;  the  hyperbolic  um.bilic  points 
are  are  all  outside  the  hypocycloid  and  this  is  the  un¬ 
bounded  part  of  the  space. 

Could  we  choose  instead  of  («,u)  =  (1,0)  the  case 
of  (u,v)  =  (0, 1)?  This  gives  us  the  forms  in  the  plane 
V  =  1,  the  unit  translate  of  the  «- plane,  or  the  real 
cubic  forms  91  (u2®  -f  2-2*).  The  question  is  how  does 
the  discriminant  variety  intersect  this  plane  or  these 
forms.  The  discriminant  variety  consists  of  those  forms 
having  double  roots  at  least.  For  wliich  ti’s  will  there  be 
double  roots?  Since  91  (uz^  +  z'z’)  is  liomogeneous  in 
2  and  2  =  0  is  accounted  for,  we  may  suppose  that  the 
roots  have  absolute  value  one,  jcj  =  1,  or  that  2  =  e'^. 
In  this  case 

91(u2H2-2‘)  = 

i(u(e3‘'-{-e-^‘^)  +  e‘‘’  +  e-‘'). 

For  what  u’s  does  this  form  have  a  double  root  and 
hence  be  on  tlie  discriminant  variety?  This  is  the  same 
form  after  nvaltiplying  by  to  get 

^(u  1) -f  -rf'**') 
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The  derivative  of  this  form  is  supposed  to  vanish  since 
we  want  ti’s  giving  a  double  root,  so,  after  solving  for 
«,  we  have 

-3«  =  -2e-2^»  +  e-«^ 

To  recognize  this  equation  as  a  cycloid  rewrite  the  vari¬ 
ables  0=  ^  and  u  —  -j  to  get 

This  is  the  locus  of  a  point  on  a  circle  of  radius  1  rolling 
outside  a  circle  of  radius  1,  otherwise  known  as  an  epicy¬ 
cloid  of  one  cusp,  or  a  cardioid.  This  includes  the  case 
of  all  three  roots  of  the  cubic  form  being  identical.  In 
this  case  the  second  derivative  of  the  form  vanishes, 

$iI  =  2ie‘'^-}-2ic-2‘'^, 

ay) 

which  when  set  equal  to  zero  gives 
=  -1  = 
and 

^  3 

for  i'  in  the  integers  Z  and  v  is  the  single  point  g. 

This  scaled  fraction  point  is  a  cusp  of  the  curve  be¬ 
cause  a  tangent  line  is  defined  everywhere  on  the  curve, 
and  there  is  a  tangent  circle  defined  everywhere  but  at 
that  point. 


Figure  2.  Epicycloid  of  One  Cusp  or  a  Cardioid 

This  time  we  have  chosen  to  look  at  those  of  tlie  form 
3?  (uz^  +  z-z').  This  says  tliat  tile  form  (u,  i)  =  (0, 1) 
or  {a,b,c,d)  =  (1,0, 1,0)  is  included  \sliich  has  posi- 
ti\e  diseriminant  1  and  tlie  foim  lias  one  real  root  and 
two  distinct  comide.v  roots  and  is  therefore  a  hyper¬ 
bolic  umbilic.  Ihe  epieyeloid  We  ha\e  discovered  lies  in 
the  plane  given  by  t  —  1  ui  the  « -plane.  This  plane 
also  gets  mo\ed  by  the  unit  eiiclc  group.  Recall  the  u 
thiice  while  i  oik<  Otherwc-e  said,  oin  third  rotation 
of  i  while  u  rotatis  once.  This  means  that  the  epny 
cloid  111  the  (i-plaiii  rot.iles  once  while  i  =  1  moeesto 


V  =  e^.  The  Epicycloid  has  bilateral  symmetry,  and 
this  puts  the  epicycloid  in  the  same  position.  Moving 
through  the  next  two  thirds  puts  the  epicycloid  back 
twice  again  to  its  original  position.  We  are  looking 
at  this  torus  with  a  hypocycloid  cross-section  from  the 
point  of  view  of  the  (ti,  geometry.  This  torus  does 
indeed  present  us  with  a  picture  of  the  parabolic  um¬ 
bilic  surface,  but  keep  in  mind  that  we  are  looking  at 
this  singular  set  from  a  chosen  perspective,  one  where 
the  hyperbolic  umbilic  point  («,  v)  =  (0, 1)  and  all  other 
hyperbolic  umbilic  points  are  inside  the  bounded  part 
of  the  epicycloid;  the  elliptic  umbilic  points  are  all  out¬ 
side  the  epicycloid  and  they  form  the  unbounded  part 
of  the  space  of  forms. 

I  have  summarized  symbolically  a  collection  of  per¬ 
haps  10/f  years  of  some  of  the  most  important  ideas  in 
mathematics.  Indeed,  many  of  these  ideas  are  founda¬ 
tion  stones  of  contemporary  computer  graphics.  I  have 
reformulated  them  in  a  way  to  express  them  in  three 
dimensional  or  physical  materials.  These  are  priceless 
ideas  which  I  work  into  otherwise  worthless  stone  or 
bronze. 

NC:  Numerical  Control 

The  initials  ‘NC’  represent  ‘numerically  controlled’ 
a  phrase  which  originated  in  1952  at  the  Servomecha¬ 
nisms  Laboratory  of  MIT  which  was  subcontracted  by 
Parsons,  Inc.,  who  was  commissioned  by  the  U.S.  Air 
Material  Command  to  automate  helicopter  rotor  blade 
manufacture.  SL-MIT  modified  a  Cincinnati  Hydrotel 
milling  machine  to  operate  from  binary  punched  tape. 
Umbilic  Torus  NC\s  more  complex  than  the  early  rotor 
blades,  but  owes  its  existence  to  the  continued  develop¬ 
ment  of  this  technology. 

By  1988  when  1  was  ready  to  do  the  Umbilic  Torus 
A'Cat  the  Brigham  Young  University  Robotics  Labora¬ 
tory,  the  largest  machine  available  there  was  a  Cartesian 
3-axis  Kearney  k  'lYekker,  VB-2.  This  machine  could 
still  read  paper  tape.  Fortunately  it  was  also  interfaced 
with  a  PC.  As  il  was  we  had  to  install  an  on  board  hard 
disc  to  accomodate  all  of  the  quill  moves  for  the  Um- 
btlu  Torus  NC.  The  source  of  the  data  load  arose  from 
the  fact  that  while  the  Umbiltc  Torus  NC  has  spatial 
syminetr)  that  ssmiiatry  is  nut  particularly  compati¬ 
ble  with  the  Cartesian  structure  of  the  3  axis  machine. 

The  central  pioblein  in  any  NC  application  is  tool 
path  'I  he  ball  end  mill  a|jpfo.\.iinates  a  sphere  of  pus 
iti\e  radius  and  tool  olfscts  had  to  be  Computed  in  ad- 
\anee  whate\er  tool  path  was  seleeled.  Since  the  Ullt~ 
btlii.  Tuius  NC  is  a  coinplc.x  surface  the  normals  to  this 
surface  at  any  point  wefe  coiiipiiled  symbolically  and 
paraaielricaily  juior  to  coiuj/uting  llic  tool  offsets 

1  oi  aesthetic  and  it  tinned  out  jnactical  reasons,  cf., 
[)].  I  .selected  the  tool  path  for  tin  I'liibiln  Touts  .YC'to 
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be  a  surface  filling  curve.  The  parametric  single  domain 
for  the  vector  valued  function  defining  the  Umbilic  Tori 
is  essentially  a  square.  It  is  not  enough  to  define  this 
domain  square  with  inequalities.  To  machine  the  torus, 
points  within  the  square  domain  have  to  be  accessed  in 
an  ordered  path.  This  ordered  path  is  to  be  followed 
by  a  cutting  tool.  The  surface  filling  curves  offer  effec¬ 
tive  ways  of  covering  a  single  domain  square  and  hence 
covering  the  image  object. 


Figure  3.  Hilbert  Surface  Filling  Curve  in 
the  Domain  of  the  Umbilic  Torus  NC 


The  Hilbert  curve,  which  is  a  2-adic  version  of 
Peano’s  suite  of  q-adic  curves  is  defined  (up  to  scale) 
recursively  by  the  following  set  of  ordered  points,  Be¬ 
gin  with  //[O]  =  (0,0).  Then  for  n  >  I,  define 

//(n]  =  //(n-l].(5  J) 

U 

(2"-’,0)  +  //(n-l] 

U 

(2"-\2"-‘)  +  //(n-  1] 

U 

(2'-‘_l,2"-l)-//(,,-l].(j 

where  U  is  an  ordered  union  uf  tlie  sets  of  points.  The 
straight  line  segments  which  join  the  points  in  order 
become  space  curves  on  the  Uinbthi.  Torus  NC.  Further 
details  can  be  found  in  (4,  5]. 

1  encountered  two  difficulties  with  this  NC  technol¬ 
ogy.  rigidity  and  scaling.  The  computer  driven  milling 
machine  is  essentially  a  mindless  lobot,  the  tool  path 
and  trajectory  have  to  be  specified  in  complete  detail 
111  advance.  While  it  can  do  very  well  certain  kinds 
of  elegant,  accurate,  and  repiodueible  work,  it  is  Very 
difficult  to  interrupt  or  reposition.  After  th<  program¬ 
ming  Is  done  you  hope  you  like  what  you  set  Once  the 
general  hardness,  toughness,  eu..  of  the  matui.d  to  be 


Figure  4.  Hilbert  Surface  Filling  Space 
Curve  in  the  Image  of  the  Umbilic  Torus  NC 

cut  is  determined  the  material  is  not  relevant  and  not 
a  part  of  the  process.  As  for  scaling,  how  big  can  a 
robot  be?  Milling  machines  tend  to  be  built  around  the 
space  in  which  they  do  the  cutting,  they  don’t  reach  out 
anywhere.  They  are  profoundly  expensive  to  build  and 
maintain.  The  capital  cost  of  equipment  like  the  VB-2 
used  for  the  Umbilic  Torus  NC  was  between  gAf$  and 
gMS.  This  gives  a  active  cutting  region  of  four  or  five 
cubic  feet  maximum  at  a  capital  cost  of  25.!^$  per  cubic 
foot. 

The  interactive  aspect  of  NC  milling  of  a  three  di¬ 
mensional  object  is  limited  to  the  computer  graphics 
previewing  of  the  image,  the  tool  path,  the  trajectory 
of  the  cutting  apparatus. 

VIP:  Virtual  Image  Projection 

The  concepts  involved  in  the  virtual  image  projection 
system  were  certainly  motivated  by  the  difficulties  with 
rigidity  and  scaling  described  above.  Addressing  scal¬ 
ing,  the  active  cutting  region  has  increased  to  twenty 
seven  cubic  feet  at  a  capital  cost  of  0.37/CS  per  cu¬ 
bic  foot.  While  there  are  sacrifu,es  in  accuracy  in  the 
present  system,  they  are  not  there  in  principle  [Ij. 

The  virtual  image  projection  system  offers  the  possi¬ 
bility  of  human  interaction  in  a  positive  way.  Software 
for  selection  of  tool  trajectories  is  difficult  to  develop 
and  is  not  available  in  generality.  Yet  the  relative  po¬ 
sitioning  involved  in  global  tool  trajectory  selection  is 
something  humans  are  well  equipped  to  do  Humans 
are  less  well  equipped  to  do  absolute  quantitative  posi 
tioning. 

A  virtual  image  projection  system  strengthens  the 
latter  and  allows  a  wide  range  of  interactive  choices  of 
when  and  where  to  approach  the  desired  image  The 
software  makes  it  very  easy  to  reposition  the  virtual 
linage  after  relocating  the  material.  Also,  the  system 
is  independent  of  any  particulai  tool,  so  that  a  van 
ety  of  tools  ean  be  used  to  addiess  the  mateiial  This 
■illows  a  sen.sitivity  to  the  material  which  is  inipoitaiit 
for  diitet  caning  in  n.ituial  .stone  The  process  can  be 
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interrupted,  new  images  superimposed,  and  ‘quoted’  in 
rescaled  form. 


Figure  5.  A  bronze  Umbilic  Torus  NC  sit¬ 
ting  on  an  enlarged  Carrara  marble  quotation 
of  a  fragment  of  itself.  Note  the  Hilbert  curve 
articulation  in  the  marble 

The  idea  of  the  virtual  image  projector  is  simple  yet 
powerful:  Invert  a  3D  digitizer.  The  computer  is  used 
as  a  kind  of  oracle.  Inquiries  are  made  about  the  lo¬ 
cation  of  the  desired  image  which  can  be  tliought  of  ns 
present  in  the  material.  Specifically,  give  the  computer 
the  software  capability  of  calculating  the  nearest  dis¬ 
tance  from  a  point  on  the  uncut  material  to  the  image. 
We  will  refer  to  the  uncut  material  as  ‘the  block’  since 
the  system  has  been  used  primarily  for  quantitatively 
carving  stone. 

On  the  ceiling  of  my  studio  is  a  fixed  triangle  with 
a  string  potentiometer  at  each  vertex.  Three  steel  ca¬ 
bles  under  tension  meet  at  a  ‘point’  plectrum  and  can 
be  pulled  down  to  touch  the  block,  The  potentiome¬ 
ter  outputs  are  digitized  and  the  information  interfaced 
with  a  Mac  II.  A  foot  mouse  (rat)  is  used  to  click  the 
points  into  the  Mac  II. 


riGlK.K  C  SP  1,  tin.  ceiling  triangle  isith 
string  potent loineli-TS  at  the  end  points  Tin 
oper  itoiV  lingei  can  lx-  seen  in  the  iilectruni 
foi  1)1111“ -I  \<Tie\  of  a  ! •'trail'  '! son  under  ten 
Sion 


Three  general  position  registration  points  are  se¬ 
lected  on  the  block.  These  three  points  are  labelled 
in  some  specific  order.  Three  distinct  labelled  points  in 
general  position  on  a  block  suffice  to  determine  the  po¬ 
sition  and  location  of  the  block  before  and  after  a  rigid 
motion.  If  the  block  is  moved  then  the  three  points  are 
touched  with  the  plectrum  and  clicked  into  the  Mac  II. 


Figure  7.  Note  the  operator’s  finger  in  the 
plectrum  forming  a  vertex  of  a  tetrahedron 
under  tension  and  touching  the  registration 
jjoint  labelled  1 

The  three  general  position  registration  points  in  or¬ 
der  can  be  thought  of  as  rows  of  a  3  x  3  invertible  matrix. 
The  position  and  orientation  of  the  block  is  implicit  in 
this  matrix.  For  example,  think  of  the  general  decom¬ 
position  of  the  n  X  u  invertible  matrices 

GLin,R)  =  A(n)  x  D{n)  x  SO{n) 

into  lower  In...  .  •  ".nipotent,  diagonal,  and  orthogo¬ 
nal  matrices,  e.g.,  Gram-Schmidt  orthogonalization.  In 
the  case  of  u  —  3,  dimA(Ti)  =  3  and  diiri.SO(«)  =  3 
where  the  semi-direct  jiroduct  A(n)  x  SO{n)  corre¬ 
sponds  to  the  group  of  rigid  motions.  There  is  software 
for  relocating  the  three  registration  points  on  the  block 
as  well  as  for  realigning  the  \'irtual  image  with  the  new 
block  i)osition. 


I  |v*vK(  S  ilu  I'loC  k  face  Willi  holes  drilled 
lu  ih'  iiiilhmt'l.  i  d<>|)ilis  iiulKaied 
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The  virtual  image  in  this  application  is  resident  in 
the  Mac  II  in  the  form  of  parametric  equations.  In  indi¬ 
cation  of  how  these  parametric  equations  were  designed 
is  in  the  mathematical  design  paragraph  above.  Find¬ 
ing  the  right  parametric  equations  to  do  specific  things 
is  non-trivial.  The  parametric  equation  set  could  be 
replaced  by  a  previously  digitized  data  set,  systems  of 
splines,  nerbs,  etc.  The  software  includes  an  algorithm 
which  calculates  from  any  given  point  in  space  the  near¬ 
est  distance  from  that  point  to  the  virtual  image. 


FRiURE  9.  One  side  of  the  block  has  been 
excavated  to  the  depths  of  the  holes  drilled  in 
the  stone  to  the  nearest  distance  to  the  vir¬ 
tual  image.  The  virtual  image  is  becoming 
less  virtual 

Given  a  point  on  the  block  and  the  nearest  distance, 
one  can  safely  remove  and  entire  sphere  of  radius  that 
nearest  distance  and  center  that  point  on  llie  block.  The 
closer  to  the  virtual  image  one  is,  the  smaller  the  sphere. 
In  principle  it  does  not  matter  what  direction  one  drills 
from  the  point,  in  fact  one  drills  short  to  account  for 
the  diameter  of  the  drill. 


Figure  10.  The  block  has  now  been  turned 
over.  The  three  labelled  legistratioii  points 
are  clicked  off  in  order  so  that  the  virtual  im¬ 
age  IS  also  ‘turned  over’ 

The  other  side  of  the  block  has  been  quant itativel\ 
caned  aftei  drilling  to  an  acceptable  accurais  With 


this  equipment,  accuracy  can  be  achieved  to  a  millime¬ 
ter  or  two.  This  piece  has  some  undercut  features  which 
were  extrapolated.  Three  small  registration  holes  that 
were  drilled  into  the  block  were  left  as  reminders  of  the 
quantitative  origins  of  the  piece. 


Figure  11.  Frontal  view  of  the  Umbilic 
Torus  NIST.  Note  that  the  curve  of  cusps  goes 
once  the  long  way  and  thrice  the  short  way 


Future 

The  next  generation  of  Virtual  Image  Projector,  SP 
2,  scheduled  to  be  installed  in  my  sculpture  studio  for 
testing  and  evaluation,  has  six  instead  of  three  digitized 
cables.  These  are  arranged  in  Stewart  platform  format, 
cf.,  [1].  The  six  cables  terminate  in  pairs  in  the  vertices 
of  a  ceiling  triangle  and  in  the  vertices  of  a  neutrally 
bouyant  triangle  with  a  rigidly  affi.xed  tool.  The  oper¬ 
ator  interactively  flics  the  triangle.  Tool  tip  position 
{x,y,:)  coordinates  and  tool  orientation  (pitch,  roll, 
yaw)  are  available  from  the  digital  readout  mounted  on 
the  triangle.  Software  options  include  spatially  parallel 
hole  drilling  to  the  depth  of  tlie  virtual  image  This  sys¬ 
tem  allows  for  an  active  cutting  region  (with  undei  uts 
po.ssible)  of  four  cubic  yards  at  a  capital  cost  of  0.2oA'S 
per  cubic  foot. 

A  twenu  foot  version  of  this  Stewart  (ilatform  with 
a  chain  saw  attachment  has  been  built  at  NIST  Larger 
systems  with  .--pans  of  hundreds  of  feet  are  feasible.  {!]. 

.‘\  C  K  N  O  \V  1,  E  D  G  E  .M 1 ;  .\  1'  S 

I  wi^h  to  thank  I'rufessoi  .lurdaii  (’o.\  .iinl  la.'-  a.-.^o 
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Figure  12.  Quarter  view  of  the  Umbilic 
Torus  NIST.  Note  the  cardioid  cross-section 
devolving  about  the  curve  of  cusps  and  the 
natural  veins  of  the  Carrara  marble 


Figure  13.  SP  2,  a  Stewart  Platform  struc¬ 
ture  coupling  tool  tip  position  and  tool  orien¬ 
tation  with  si.K  digitized  cable  lengths 
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Designing  Solid  Objects  Using 
Interactive  Sketch  Interpretation 
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ABSTRACT 

Before  the  introduction  of  Computer  Aided  Design  and 
solid  modeling  systems,  designers  had  developed  a  set  of 
techniques  for  designing  solid  objects  by  sketching  their  ideas 
on  pencil  and  paper  and  refining  them  into  workable  designs. 
Unfortunately,  these  techniques  are  different  from  those  fur 
designing  objects  using  a  solid  modeler.  Not  only  does  this 
waste  a  vast  reserve  of  talent  and  experience  (people  typically 
start  drawing  from  the  moment  they  can  hold  a  crayon),  but 
it  also  has  a  more  fundamental  problem;  designers  can  use 
their  intuition  more  effectively  when  sketching  than  they  can 
when  using  a  solid  modeler. 

Viking  is  a  solid  modeling  system  whose  user>interface  is 
based  on  interactive  sketch  interpretation.  Interactive  sketch 
interpretation  lets  the  designer  create  a  line-drawing  of  a  de¬ 
sired  object  while  VliUng  generates  a  three-dimensional  ob¬ 
ject  description.  This  description  is  consistent  with  both  the 
designer’s  line-drawing,  and  a  set  of  geometric  constraints 
either  derived  from  the  line-drawing  or  placed  by  the  de¬ 
signer.  MIdng’s  object  descriptions  are  fully  compatible  with 
the  object  descriptions  used  by  traditional  solid  modelers. 
As  a  result,  interactive  sketch  interpretation  can  be  used  with 
traditional  solid  modeling  techniques,  combining  the  advan¬ 
tages  of  both  sketching  and  solid  modeling. 
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1  INTRODUCTION 

Sketching  has  long  been  an  important  element  of  the 
design  process.  For  hundreds  of  years,  people  have  designed 
by  making  quick,  abstract  drawings  or  "sketches.”  Sketching 
was  used  both  to  specify  embryonic  concepts  and  to  refine 
these  concepts  into  workable  designs,  Thirty  or  so  years 
ago,  the  advent  of  Computer  Aided  Design  (CAD)  and  solid 
modeling  systems  began  to  revolutionize  some  aspects  of 
the  design  process.  These  programs  let  designers  create  a 
model  of  a  three-dimensional  object  on  the  computer.  This 
model  can  then  be  analyzed  in  ways  that  would  be  difficult 
or  impossible  without  the  computer.  For  example,  CAD 
systems  and  associated  programs  can  display  realistic  images, 
do  stress  analyses,  and  generate  milling  machine  programs 
from  the  computer’s  model  of  the  object. 

Unfortunately,  the  CAD  revolution  did  not  extend  to  at 
least  two  critical  aspects  of  the  design  process;  exploring  new 
ideas  and  refining  these  ideas  into  workable  designs.  With 
current  CAD  systems,  the  model  typically  changes  in  large, 
discontinuous  steps.  The  designer  is  often  forced  to  fully 
specify  a  change  before  he  or  she  has  a  chance  to  see  how  it 
interacts  with  the  rest  of  the  model.  This  makes  "feedback 
driven"  design,  in  which  the  designer  uses  feedback  from  one 
change  to  guide  the  next  change,  difficult  on  a  solid  modeler, 
the  magnitude  of  each  change  is  too  large  to  let  the  designer 
use  his  or  her  intuition  effectively.  As  a  result,  designers  will 
often  use  pencil  and  paper  to  "work  out”  a  change  before 
making  the  change  on  the  computer. 

The  techniques  used  to  design  objects  on  pencil  and  paper 
are  diSTerent  from  those  used  to  design  objects  on  a  solid 
modeler  [13].  Sketching,  in  this  context,  is  a  visual  and 
intuitive  process  in  which  a  drawing  is  refined  over  time 
by  making  small,  incremental  changes.  At  each  point  in 
the  process,  the  designer  uses  feedback  from  one  change  - 
the  appearance  of  the  modified  sketch  -  to  guide  the  next 
change.  The  continual  feedback  lets  the  designer  use  his  or 
her  intuition  effectively. 

This  paper  presents  a  solid  modeling  system,  Vildng,  that 
lets  the  user  design  three-dimensional  objects  using  tech¬ 
niques  nonnaily  used  to  create  and  refine  two-dimensional 
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Figure  1:  Viking's  display. 


sketches,  VUng  uses  interactive  sketch  interpretation  to  cre> 
ate  a  ''what  you  draw  is  what  you  get"  user-interface.  Users 
can  create  a  line-drawing  of  a  desired  object  and  use  sketch 
interpretation  to  generate  a  three-dimensional  object  that  is 
consistent  with  the  line-drawing.  Users  can  also  place  geo¬ 
metric  constraints  on  the  object.  Ihese  constraints,  together 
with  a  set  of  constraints  derived  from  the  line-drawing,  are 
used  to  define  a  vertex  geometry  in  subsequent  interpreta¬ 
tions,  Geometric  constraints  let  the  user  create  precisely 
dimensioned  objects.  The  resulting  user-interface  combines 
the  power  of  tra^tional  solid  modeling  systems  with  the  con¬ 
tinuous  feedback  of  sketching. 

2  THE  V/mGSOUD  MODELER 

Viking  extends  the  direct  manipulation  metaphor  to  three- 
dimensional  object  design  by  letting  the  user  mo^fy  an  object 
by  changing  its  line-drawing.  For  most  changes,  deducing 
an  appropriate  change  in  the  object  description  is  trivial.  For 
example,  if  the  user  erases  a  line,  delete  the  corresponding 
edge.  With  other  changes,  such  as  making  a  line-segment 
visible,  there  is  no  obvious  corresponding  change  in  the  ob¬ 
ject  description.  Sketch  interpretation  is  used  in  these  cases 
to  generate  a  new  object  description  that  is  consistent  with 
the  modified  line-drawing. 

Sketch  interpretation  divides  the  task  of  interpreting  a 
line  drawing  into  two  parts:  finding  a  surface-topology  and 
solving  for  a  vertex  geometry.  The  first  part  is  done  by  gen¬ 
erating  surface-topologies  that  are  consistent  with  the  line¬ 
drawing  until  one  that  is  acceptable  to  the  user  is  found.  The 


second  is  done  by  using  a  geometric  constraint  solver  to  find 
a  vertex  geometry  that  satisfies  a  system  of  constraints  ei¬ 
ther  derived  from  the  line-drawing  and  the  proposed  surface- 
tc^logy,  or  placed  by  the  user.  The  surface-topology  and 
vertex  geometry  combine  to  form  a  three-dimensional  object 
description  that  is  consistent  with  both  the  line-drawing  and 
the  constraints. 

2.1  VIKING'S  USER-INTERFACE 

Figure  1  shows  Viking's  display  after  creating  an  equi¬ 
lateral  triangle.  The  left  window  shows  a  line-drawing  of 
underlying  object  description  and  the  upper  center  window 
shows  the  view  transform  used  to  generate  the  line-drawing. 
Both  windows  let  the  user  directly  modify  their  contents. 
The  user  can,  for  example,  move  a  vertex  by  dragging  it  to  a 
new  location  with  the  mouse.  The  user  can  also  dynamically 
change  the  view  transform  by  dragging  the  mouse  across  the 
orientation  triad,  rotating  the  view  about  an  axis  perpendicu¬ 
lar  to  the  mouse’s  motion  [9], 

The  line-drawing  displays  more  than  just  an  object’s 
shape.  Thick,  thin  and  double  lines  respectively  correspond 
to  edges  adjacent  to  zero,  one  and  two  faces  in  the  object  de¬ 
scription.  Circles  correspond  to  vertices  that  can  be  moved 
by  the  constraint  solver  when  solving  for  a  vertex  geometry. 
Triangles  correspond  to  vertices  whose  positions  are  consid¬ 
ered  fixed  constants  by  the  constraint  solver.  Consuaints  are 
drawn  in  a  variety  of  ways.  Distance  constraints,  for  ex¬ 
ample,  are  shown  by  thin,  bent  lines.  In  Figure  1,  the  “A” 
symbol  at  the  bend  indicates  that  all  three  sides  of  the  Uiangle 
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have  the  same  length. 

2.1.1  VIKING 'S  COMMAND  MODES 

The  four  items  shown  in  the  center  window  of  Figure  1 
(Edit,  Move,  Constraint  and  Component)  correspond  to  the 
four  most  commonly  used  modes  in  Viking.  These  modes  de* 
termine  how  mouse  actions  in  the  line-drawing’s  window  are 
interpreted.  If  the  user  enters  either  Constraint  or  Component 
modes,  the  center  window  is  overwritten  with  a  specialized 
menu. 

Edit  mode  is  used  for  changing  the  appearance  of  the 
line-drawing  displayed  in  the  image  window.  While  in  it, 
the  user  can  draw  new  edges,  erase  old  ones  and  change 
the  visibility  of  line-segments.  For  the  first  two  actions, 
both  the  line-drawing  and  the  underlying  object  description 
change.  For  the  last  action,  only  the  line-drawing  changes: 
the  underlying  object  description  is  not  always  modified:  the 
Autosolve  switch,  located  on  in  the  bottom  center  window, 
determines  whether  Viking  will  automatically  generate  a  new 
interpretation  after  the  user  changes  the  visibility  of  a  line- 
segment,  or  wait  until  the  user  explicitly  requests  a  new 
interpretation. 

Move  mode  is  used  for  placing  tacks,  and  moving  vertices 
and  edges.  Ibcks  are  simple  constraints  that  either  lock  a 
vertex  into  a  fixed  position  or  force  an  edge  to  pass  through  a 
point  in  space.  If  the  Autosolve  switch  is  on.  Viking  will  use 
the  constraint  solver  to  maintain  the  constraints  as  the  user 
drags  a  vertex  or  edge  around  with  the  mouse.  Otherwise, 
the  vertex  or  edge  will  follow  the  mouse  without  maintaining 
the  constraints. 

Constraint  mode  is  used  for  placing  or  editing  geometric 
constraints  on  the  object.  The  constraint  menu  lets  the  user 
select  a  constraint  template  and  then  define  constraints  by 
picking  vertices  or  edges  to  "fill  in”  the  blanks.  The  user  can 
also  modify  or  delete  previously  defined  constraints.  When¬ 
ever  the  user  adds  a  constraint,  Viking  will  attempt  to  find  a 
solution  to  the  new  system  if  the  Autosolve  switch  is  turned 
on. 

Component  mode  is  used  for  manipulating  groups  of 
vertices,  edges  and  faces.  Every  component  has  a  coordi¬ 
nate  transform  that  defines  the  effective  position  of  its  ver¬ 
tices.  The  coordinate  transform  is  generated  from  eleven 
variables  that  control  a  component’s  size  (using  both  an  axis- 
independent  variable  and  th^  axis-dependent  variables),  po¬ 
sition.  and  orientation  (using  quaternions  [12]}.  The  user  can 
lock  or  free  these  variables  independently  and  the  consuaim 
solver  can  manipulate  the  free  variables  when  solving  for  a 
vertex  geometry. 

2. 1.2  SKETCHING  IN  THREE-DIMENSIONS 

Sketching  is  traditionally  done  in  only  two  dimensions. 
With  Viking,  however,  sketches  are  three  dimensional  enti¬ 
ties.  This  both  aids  and  hinders  the  user.  A  three-dimensional 
“sketch”  can  help  the  user  visualize  the  object  it  represents. 
But  It  also  means  the  user  must  specify  the  location  of  each 
vertex  in  three-dimensions. 


A  simple  mechanism  for  specifying  a  vertices’  approxi¬ 
mate  location  is  needed.  If  the  user  can  place  every  vertex 
near  its  correct  position,  then  the  user  can  rotate  the  object  and 
the  line-drawing  will  behave  intuitively.  This  lets  foe  user 
continue  foe  design  process  until  he  or  she  knows  enough  to 
start  using  consuaints  to  specify  foe  vertices’  position  pre¬ 
cisely.  Also,  since  the  vertices  start  close  to  a  geometry  that 
satisfies  foe  constraints,  foe  constraint  solver  will  need  less 
time  to  find  a  solution. 

Geometric  constraints  are  not,  by  themselves,  a  good 
mechanism  for  specifying  approximate  vertex  positions.  In 
part,  this  is  because  foe  constraint  solver  works  best  when 
all  vertices  are  near  a  solution.  Relying  on  foe  consuaint 
solver  to  move  a  vertex  a  significant  distance  is,  at  best,  time 
consuming  and  often  results  in  unexpected  and  unwanted 
.solutions  (assuming  any  solution  is  found).  A  more  funda¬ 
mental  problem  with  using  constraints  for  rough  positioning, 
however,  is  their  precision.  Often,  users  do  not  know  the 
precise  location  of  a  vertex  until  late  in  the  design  process. 
Using  constraints  to  position  a  vertex  before  foe  user  knows 
its  precise  location  is  time  consuming  since  the  constraints 
will  have  to  be  changed  later,  when  the  precise  dimensions 
are  known.  It  can  also  be  intimidating:  people  do  not  like 
answering  questions  until  after  they  know  foe  answers. 

The  user  can  position  a  vertex  in  three-dimensions  by 
showing  where  it  “should  be”  in  two  different  views.  Un¬ 
fortunately,  this  technique  forces  foe  user  to  work  in  two 
different  views,  which  is  dilficult.  For  example,  it  is  not  al¬ 
ways  obvious  which  vertex  in  one  view  corresponds  to  which 
vertex  in  the  other. 

When  no  other  information  is  available,  Viking  uses  a 
simple  rule  when  drawing  edges:  both  end-points  have  foe 
same  z-coordinate  in  foe  display’s  coordinate  space.  For 
many  cases,  such  as  drawing  a  short  edge  from  an  existing 
vertex,  this  is  sufficient.  In  other  cases,  neither  this  method 
nor  foe  aitematives  given  above  suffice.  Because  of  this. 
Viking  provides  two  additional  mechanisms  to  let  foe  user 
easily  specify  foe  location  of  a  vertex  in  three-dimensions: 
preferred  dir^tions  and  cutting  planes. 

Preferred  directions  are  three-dimensional  vectors.  When 
foe  user  draws  an  edge.  Viking  draws  short  lines  parallel  to 
each  preferred  direction  at  the  new  edge’s  origin.  As  the  user 
moves  foe  mouse,  foe  edge’s  endpoint  is  projected  onto  the 
closest  prefened  direction. 

Preferred  directions  can  be  defined  in  two  ways.  First, 
foe  user  can  define  vectors  in  object  space,  such  as  foe  x, 
y  and  z  axes,  for  preferred  directions.  Any  new  edge,  no 
matter  where  it  is  drawn,  will  be  able  to  use  these  preferred 
directions.  Second,  the  user  can  put  preferred  directions  on 
automatic.  In  this  case.  Viking  automatically  defines  pre¬ 
ferred  directions  depending  on  the  context  in  which  the  user 
started  to  draw  the  new  edge.  If  foe  user  is  drawing  an  edge 
from  an  existing  vertex,  then  the  preferred  directions  are  de¬ 
fined  to  be  parallel  to  each  of  the  edges  radiating  from  the 
vertex.  If  foe  user  is  drawing  an  edge  from  an  existing  edge. 
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then  one  preferred  direction  is  defined  to  be  parallel  to  the 
edge  and,  for  each  adjacent  face,  a  preferred  direction  is  de¬ 
fined  to  lie  in  that  face’s  plane  and  be  perpendicular  to  the 
edge.  If  these  rules  generate  one  preferred  direction,  then  two 
preferred  directions  are  added  that  are  perpendicular  to  the 
original  preferred  direction  and  each  other.  If  two  preferred 
directions  were  generated,  then  a  third  preferred  direction 
perpendicular  to  the  first  two  is  added. 

A  cutting  plane  is  a  plane  defined  in  object  space.  Cut¬ 
ting  planes  are  a  tool  for  both  positioning  a  vertex  in  three- 
dimensions  and  helping  the  user  visualize  the  object’s  three- 
dimensional  structure.  The  user  can  position  a  vertex  in 
three-dimensions  by  moving  it  parallel  to  the  cutting  plane 
or  parallel  to  the  cutting  plane’s  normal. 

The  user  can  manipulate  the  cutting  plane  by  moving  it 
parallel  to  its  normal,  changing  the  orientation  of  its  normal, 
and  controlling  the  way  in  which  it  is  displayed.  The  user 
can,  among  other  things,  make  the  cutting  plane  opaque  or 
translucent,  highlight  the  intersection  of  the  cutting  plane 
with  the  object,  show  the  orthogonal  projection  of  the  object 
onto  the  cutting  plane,  and  draw  height  poles  between  each 
vertex  and  the  cutting  plane. 

3  IMPLEMENTATION 

Viking's  implementation  of  interactive  sketch  interpreta¬ 
tion  uses  two  distinct  data-structures:  one  holds  the  current 
object  description  and  the  other  holds  the  line-drawing  dis¬ 
played  to  the  user.  The  user  can  modify  the  line-drawing 
and  most  changes  automatically  propagate  to  the  current  ob¬ 
ject  description,  maintaining  consistency  between  the  two 
data-structures.  The  user  can  also  change  the  viewpoint,  in 
which  case  the  line-drawing  is  recreated  from  the  new  view 
transform  and  the  current  object  description. 

Sketch  interpretation  generates  a  new  object  description 
when  the  user  makes  a  clrnnge  that  can  not  be  propagated  to 
the  object  description  autonuttically.  Viking’s  sketch  inter¬ 
pretation  algorithm  splits  the  task  of  generating  a  new  object 
description  into  two  parts:  finding  a  surface-topology  that 
is  a«nsist(mt  with  the  line-drawing  and  solving  for  a  ver¬ 
tex  geometry  that  satisfies  the  object’s  implicit  and  explicit 
constraints.  Together,  the  surface-topology  and  the  vertex 
geometry  completely  describe  a  thr^-dimcnsional  object. 
The  new  object  description  is  consistent  with  both  the  line- 
drawing  created  by  the  user  and  any  geometric  consuaints  he 
or  she  may  have  specified. 

Viking  uses  arc-labeling  [10],  an  extension  of  Huffman- 
Clowes  line-labeiing  (3, 8]  to  non-irihedral  vertices,  to  gener¬ 
ate  a  surface-topology  from  a  line-drawing  and  an  old  object 
description.  The  surface-topology  defines  a  set  of  faces  that 
are  consistent  with  the  line-drawing.  Since  line-drawings  can 
have  many  different  interpretations.  Viking  uses  heuristics  to 
seek  out  the  more  desirable  interpretations  first.  Viking  gen¬ 
erates  surface-topologies  in  order  of  increasing  cost,  where 
the  cost  is  based  on  several  heuristics,  including: 

•  how  similar  the  surface-topology  is  to  the  current  ob¬ 


ject’s  surface-topology  and 

•  if  the  user  has  given  a  preferred  object  type,  how  close 
the  surface-topology  is  to  the  user’s  preferred  type. 

Surface-topologies  are  generated  until  the  user  either  accepts 
one  or  aborts  the  search.  In  my  experience,  the  desired 
surface-topology  is  normally  the  first  surface-topology  found. 

Once  an  acceptable  surface-topology  has  been  found,  a 
non-linear  constraint  solver  finds  a  vertex  geometry  that  sat¬ 
isfies  a  system  of  geometric  consuaints.  These  constraints 
fall  into  three  categories: 

•  world:  every  face  is  a  planar  polygon. 

•  image:  visible  lines  are  in  front  of  obscuring  faces. 

•  explicit:  constraints  explicitly  defined  by  the  user. 

The  first  two  types  of  constraints  are  implicit  consuaints 
since  they  are  automatically  generated  by  Viking.  World  and 
explicit  constraints  are  always  part  of  the  system  of  equations 
used  by  the  constraint  solver.  Image  constraints  are  only 
used  when  finding  a  vertex  geomeuy  after  generating  a  new 
surface-topology  for  the  object. 

The  constraint  solver  uses  an  algorithm  developed  by 
Bullard  and  Biegler  [2].  This  algorithm  repeatedly  solves  a 
system  of  linear  equations  derived  from  the  non-linear  equa¬ 
tions  and  their  first  derivatives  until  the  global  error  is  reduced 
below  a  threshold.  The  vertex  positions  from  the  current  ob¬ 
ject  are  used  as  the  initial  solution  for  the  new  system  of 
constraints.  The  solver  tends  to  move  the  vertices  only  in 
small,  well  controlled  steps  and,  as  a  result,  solutions  tend 
not  to  differ  unnecessarily  from  the  vertex  geometry  in  the 
current  object. 

Once  an  acccputble  surface-topology  and  vertex  geome¬ 
try  have  been  found.  Viking  replaces  the  current  object  de¬ 
scription  with  the  new  interpretation.  A  new  line-drawing  is 
then  generated  from  the  new  current  object  description  and 
the  current  view  transform.  The  user  can  manipulate  the  new 
line-drawing  Just  like  the  old  one,  letting  the  user  continue 
the  cycle  of  modification  and  interpretation. 

4  EXAMPLES 

4.1  CREATING  A  CHAIR 

This  section  describes  a  session  using  Viking  to  create  an 
“easy  chair."  This  example  is  somewhat  contrived  (for  exam¬ 
ple,  chairs  arc  not  normally  made  from  homogeneous  blocks) 
but  it  docs  convey  the  flavor  of  Viking ’s  user-interface.  It  also 
demonstrates  how  modifying  the  line-drawing  can  be  used  as 
a  substitute  for  constructive  solid  geometry.  It  took  me  less 
than  two  minutes  to  transform  the  cube  in  Figure  2a  into  the 
chair  in  Figure  2i. 

Prt  rred  directions  (see  Section  2.1.2)  were  on  auto¬ 
matic  U»roughout  this  example.  As  a  result,  whenever  the 
user  started  to  draw  an  edge,  Viking  defined  a  set  of  context 


120 


depradent  vectors  that  could  be  used  to  position  the  edge’s 
endpoint  in  three-dimensions.  For  example,  preferred  direc¬ 
tions  made  it  possible  to  draw  the  nev/  ^ge  in  Figure  2b  so 
that  it  was  pa^Iel  to  the  edge  between  the  upper  and  lower 
vertices  at  the  right  and  back  of  the  cube. 

Figure  2a  shows  the  initial  object,  a  cube  loaded  from 
a  libraiy  of  standard  objects.  The  first  step  in  turning  this 
cube  into  a  chair  is  to  add  a  raised  back.  Figure  2b  shows 
the  user  drawing  a  new  edge  up  from  the  upper-right  comer 
of  the  cube.  The  user  has  finished  drawing  the  edges  for  the 
chair’s  back  in  Figure  2c  and  is  in  the  process  of  hiding  the 
line-segments  that  would  be  obscured  if  the  chair’s  back  was 
solid  and  opaque. 

In  Figure  2d,  the  user  deleted  one  unwanted  vertex  and 
is  in  the  process  of  deleting  the  other  (the  user  must  pick  a 
vertex  twice  to  delete  it;  the  first  pick  highlights  the  selected 
vertex,  the  second  deletes  is).  These  vertices  are  unwanted 
because  deleting  them  and  redrawing  the  missing  edges  en¬ 
sures  that  the  chair’s  back  is  a  single,  planar  surface.  If  these 
vertices  had  not  been  deleted.  Viking  would  have  found  an 
interpretation  in  which  the  chair’s  back  and  sides  were  each 
formed  by  two  faces. 

Deletinga  vertex  also  deletes  its  adjacent  edges  and  faces, 
altiiough  Viking  preserves  the  hidden  status  of  line-segments 
whose  obscuring  face  is  deleted.  For  example,  in  Figure  2d, 
Uie  line  at  the  bottom-back  of  the  cube  is  drawn  with  a  single, 
thin  line  (indicating  that  it  is  adjacent  to  only  one  face)  since 
the  top,  back  and  right  faces  of  the  cube  were  deleted  when 
the  first  vertex  was  deleted.  Also,  the  entire  line  remains 
hidden,  even  though  the  face  obscuring  its  right  segment  has 
been  deleted. 

Figure  2e  shows  the  user  redrawing  some  of  the  edges  that 
were  deleted  when  the  user  deleted  the  unwanted  vertices, 
in  preparation  for  using  sketch  interpretation  to  generate  a 
new  object  description.  Figure  2f  shows,  from  a  different 
viewpoint,  the  user  starting  to  draw  a  lowered  seat  on  the 
first  interpretation  found  for  Figure  2e.  Since  the  user  had  set 
the  search  bias  to  prefer  solid  objects.  Viking  sought  out  an 
interpretation  correspo.'\ding  to  a  solid  object.  Asa  result,  the 
interpretation  contains  faces  titai  were  not  needed  to  generate 
an  object  description  consistent  with  Figure  2e  since  they 
would  have  been  hidden  by  the  rest  of  the  chair. 

The  user  has  finished  drawing  a  lowered  seat  for  the  chair 
in  Figure  2g  and  is  in  the  process  of  removing  some  unwanted 
and  unnecessary  edges.  In  Figure  2h,  the  user  is  exposing 
the  line-segments  that  would  be  visible  if  the  chair’s  seat  was 
lower  than  its  arm  rests.  Figure  2i  shows,  from  a  different 
viewpoint,  the  first  interpretation  found  for  Figure  2h. 

Even  though  the  chair  looks  conect  in  Figure  2i,  the  ge¬ 
ometry  is  not  correct.  For  example,  some  edges  that  should 
be  pa^lel  to  each  oihej  are  skewed  about  10®.  These  prob¬ 
lems  can  be  fixed  in  a  minute  or  two  by  using  geomeuic 
constraints.  But,  since  the  next  example  demonstrates  the 
constraint  solver,  that  part  of  the  design  process  is  skipped. 


Figure  2c:  Hiding  obscured  line-segments. 


Figure  2d:  Remove  unwanted  vertices  and  edges. 


Figure  2g;  Remove  unwanted  edges. 


Figure  2f:  Drawing  the  chair’s  seat. 


Figure  2i:  The  “completed"  chair. 


4.2  AN  EXERCISE  IN  GEOMETRY 


Suppose  you  have  the  following  problem;  if  you  place 
a  solid  equilateral  tetrahedron  face  to  face  with  a  solid  equi¬ 
lateral  octahedron,  how  many  faces  does  the  resulting  poly¬ 
hedron  have?  The  polyhedra  are  positioned  and  sized  so 
that  three  of  the  tetrahedron’s  vertices  coincide  with  three  of 
the  octahedron’s  vertices.  Answering  this  question,  by  using 
Viking  to  create  the  object  shown  in  Figure  31,  takes  me  less 
than  three  minutes. 

Figure  3a  shows  the  user  starting  to  draw  the  two  poly- 
hedra.  In  Figure  3b,  the  user  has  changed  the  view  transform 
by  rotating  it  about  the  horizontal  axis  and  is  in  the  process 
of  completing  the  octahedron’s  wire-frame.  Figure  3c  shows 
the  user  hiding  the  line-segments  at  the  "back”  of  the  poly¬ 
hedra,  Figure  3d  shows  the  first  interpretation  found  after 
hiding  the  rest  of  the  line-segments  that  should  be  obscured. 

The  edges  in  Figure  3d  were  drawn  without  using  either 
preferred  directions  or  a  cutting  plane  to  position  the  ver¬ 
tices  in  three-dimensions.  The  user  made  no  attempt  to  draw 
the  edges  so  that  they  all  had  exactly  the  same  length.  In¬ 
stead,  geometric  constraints  will  be  used  to  turn  these  “rough 
sketches’’  into  equilateral  polyhedra. 

Figure  3e  shows  the  effect  of  adding  and  solving  for 
equal  length  constraints  on  the  tetrahedron’s  edges.  Figure  3f 
shows  the  effect  of  placing  a  similar  set  of  constraints  on  the 
octahedron.  The  bent  lines  and  “A’’  symbols  indicate  that 
all  of  the  tetrahedron’s  edges  have  the  same  length.  The 
bent  lines  and  “B”  symbols  do  the  same  for  the  octahedron’s 
edges.  In  both  Figures  3e  and  3f,  the  vertices  have  moved  to 
accommodate  the  constraints.  Figure  3g,  in  which  display  of 
the  constraints  has  been  turned  off,  shows  the  two  polyhedra 
from  a  different  direction. 

In  Figure  3h,  the  user  has  added,  but  not  yet  solved 
for,  constraints  forcing  three  of  the  tetrahedron’s  vertices 
to  be  coincident  with  three  of  the  octahedron's  vertices.  The 
bent  line  and  “0”  symbol  indicates  that  the  distance  between 
the  vertices  should  be  zero.  Figure  3i  shows  the  solution 
found  by  the  consuaint  solver  to  the  system  described  in  Fig¬ 
ure  3h.  Figures  3h  and  3i  have,  despite  appearances,  identical 
surface-topologies;  the  constraint  solver  moved  the  vertices 
without  changing  the  underlying  structure. 

In  Figure  3j,  the  view  transform  has  been  changed  to  give 
a  view  “suaight-down"  one  of  the  edges  where  the  tetrahe¬ 
dron  and  octahedron  are  in  contact.  This  view  suggests  that 
the  vertices  to  either  side  of  this  edge  are  co-planar,  forming 
a  single  four-sided  face.  In  Figure  3k,  the  user  has  merged 
the  six  coincident  vertices  into  three  vertices,  deleted  the 
unwanted  edges,  and  generated  a  new,  seven-sided,  interpre¬ 
tation.  Figure  31  shows  Figure  3k  with  all  constraints  hidden. 
Since  all  faces  must  be  planar.  Viking  would  not  be  able  to 
find  a  vertex  geometry  for  Figure  3k  unless  the  quadrilat¬ 
eral  faces  were  planar  polygons.  The  answer,  therefore,  to 
the  question  posed  at  the  beginning  of  this  section  is  that  a 
tetrahedron  and  octahedron  form  a  seven-sided  polyhedra. 


Figure  3a;  Drawing  the  polyhedra. 


Figure  3b;  Completing  the  wire-frames. 


Figure  3c;  Hiding  obscured  line-segments. 


Figure  3d;  Generating  an  interpretation. 


Figure  3e;  Making  an  equilateral  teurahedron. 
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5  FUTURE  WORK 

Vildng's  user-interface  has  some  significant  weaknesses. 

Some  of  these  are  problems  should  not  be  difficult  to  solve. 

Others  do  not  seem  to  have  easy  solutions.  These  problems 

are  presented  in  the  order  that  they  will  be  addressed  in  future 

research. 

CAD  modeling  interface 

Currently,  \^ldng  provides  few  of  the  capabilities 
found  in  conventional  solid  modeling  systems.  For 
example,  Viking  can  neither  calculate  the  mass  of  an 
object  nor  find  the  intersection  of  two  objects.  Com¬ 
bining  conventional  solid  modeling  capabilities  and  in¬ 
teractive  sketch  interpretation  should  not  be  difficult; 
Viking's  underlying  object  description  is  equivalent  to 
the  boundary  representation  description  used  by  some 
solid  modelers. 

Explicit  constraints  specification 

Viking's  users  must  explicitly  specify  every  geo¬ 
metric  constraint.  Other  constraint  based  design  sys¬ 
tems,  such  as  Snap-Dragging  [1]  [S],  provide  mecha¬ 
nisms  for  defining  constraints  implicitly.  Incoq)orating 
similar  mechanisms  into  Viking  could  alleviate  one  of 
the  more  tedious  aspects  of  Viking's  user-interface. 

Planar  faces  and  straight  edges 

Vking  can,  currently,  only  interpret  line-drawings 
of  objects  with  planar  faces.  The  sketch  interpretation 
algorithm  can  be  easily  extended  to  objects  with  non- 
planar  faces.  Modifying  the  rest  of  Viking  however,  is 
more  difficult;  planar  faces  provides  one  of  the  better 
implicitconstraints  and  designingagood  user-interface 
for  letting  the  user  specify  which  faces  are  nun-planar 
and  controlling  the  shape  of  a  non-planar  face  is  not 
easy. 

Quadhedral  vertices 

Viking  can  only  analyze  line-drawings  in  which 
every  vertex  is  adjacent  to  four  or  fewer  edges.  This  is 
because  Viking's  sketch  interpretation  algorithm  must 
match  every  intersection  in  the  line-drawing  to  an  entry 
in  a  fixed  intersection  library.  This  library  contains  all 
possible  intersections  of  two,  three  and  four  lines.  The 
program  used  to  generate  Viking's  intersection  library, 
however,  is  already  capable  of  generating  entries  for 
intersections  of  five  or  more  lines.  Adding  this  capa¬ 
bility  to  Viking  should  not  be  difficult. 

Simple  polygonal  faces 

Faces  in  Viking  must  be  simple,  planar  polygons; 
no  internal  holes  or  repeated  edges  or  vertices.  It 
should  be  possible  to  extend  the  algorithm  to  allow 
more  complicated  faces,  although  it  may  not  be  worth 
the  extra  processing  time  required.  The  current  version 
of  Viking  lets  the  user  simulate  holes  and  the  like  by 
using  artifact  edges. 


Explicit  topology  specification 

Viking's  sketch  interpretation  algorithm  uses  the 
presence  of  hidden  lines-segments  to  automatically 
reject  inconsistent  interpretations.  The  downside  of 
this  is  that  the  user  must  correctly  indicate  which  line- 
segments  are  hidden.  This  can  a  tedious  and  time- 
consuming  process. 

Viking  lets  the  user  generate  a  blind  interpretation, 
in  which  the  visibility  cues  are  ignored  and,  there¬ 
fore,  the  user  does  not  have  to  indicate  which  line- 
segments  are  hidden.  Blind  interpretations  are  slower 
and  less  discriminating  than  conventional  interpreta¬ 
tion,  since  visibility  cues  can  not  be  used  to  reject 
unwanted  topologies.  Despite  this,  it  is  often  easier 
to  generate  a  blind  interpretation  and  manually  reject 
unwanted  topologies  than  it  is  to  indicate  which  line- 
segments  are  hidden  and  generate  a  standard  interpre¬ 
tation. 

5.1  OPEN  PROBLEMS 

The  following  section  describes  problems  that  do  not 

seem  to  have  easy  solutions. 

General  view 

Viking's  sketch  interpretation  algorithm  can  only 
interpret  line-drawings  that  correspond  to  a  gener^ 
view  of  an  object.  A  general  view  is  one  in  which  a 
small  change  in  the  view  direction  makes  correspond¬ 
ingly  small  change  in  the  line-drawing  [11].  So,  for 
example,  a  general  view  could  not  contain  any  faces 
that  are  “edge-on"  to  the  viewer  (such  as  Figure  3j). 

This  is  a  problem,  since  engineering  drawings  often 
do  not  correspond  to  general  views.  However,  it  is  not 
clear  how  significant  this  problem  is.  Engineering 
drawings  often  used  specialized  viewpoints  because, 
historically,  specialized  views  were  easier  to  draw  or 
because  they  illustrated  a  particular  point.  Specialized 
views  are  not,  for  the  most  part,  easier  to  interpret 
than  general  views  and  both  types  of  views  are  easy  to 
generate  using  the  computer. 

One  possibility  for  generating  interpretations  of 
specialize  views  is  to  use  graph  based  algorithms  [4] 
[7].  These  algorithms  do  not  depend  on  the  viewpoint, 
generating  a  surface-topology  by  finding  a  planar  em¬ 
bedding  of  an  object’s  vertex-ege  graph.  Unfortu¬ 
nately,  these  algorithms  probably  could  not  be  modi¬ 
fied  to  use  Viking's  search  heuristics. 

Sketch  Interpretation  performance 

Wang's  sketch  interpretation  algorithm  is  not  as 
fast  as  one  might  wish,  taking  almost  three  minutes  to 
generate  an  interpretation  of  a  line-drawing  containing 
100  points.  The  time  required  to  generate  an  interpre¬ 
tation  seems  to  be  roughly  proportional  to  the  square 
of  the  number  of  points  in  the  line-drawing.  Although 
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faster  workstations  and  more  efficient  algorithms  may 
alleviate  this  problem,  it  is  not  realistic  to  expect  that 
Viking's  sketch  interpretation  algorithm  could  be  used 
on  large  objects  (which  might  three  or  four  orders  of 
magnitude  more  complex  than  the  objects  created  in 
Sections  4.1  or  4,2).  It  should,  however,  be  possible 
to  automatically  partition  a  large  object  and  use  sketch 
interpretation  on  only  the  relevant  parts. 

Constraint  satisfaction  performance 

Viking's  constraint  solver  is  used  in  two  basic 
modes:  when  one  or  more  constraints  have  been  added 
and  Viking  must  solve  for  a  solution  and  when  the  user 
is  moving  a  vertex  by  dragging  it  with  the  mouse  and 
wishes  to  maintain  the  pre-existing  constraints.  The 
response  time  when  dr^ging  is  far  slower  than  de- 
sir^,  often  taking  several  seconds  to  find  a  solution 
that  satisfies  all  the  constraints.  It  might  be  possible 
to  use  differential  constraints  [6]  to  improve  response 
times  when  dragging. 

6  CONCLUSIONS 

Viking  is  a  solid  modeling  system  that  uses  interactive 
sketch  interpretation  to  combine  the  simplicity  of  pencil  and 
paper  sketches  with  the  power  of  a  solid  modeling  system. 
Viking  lets  designers  draw  the  object  they  wish  to  create  and 
then  modify  it  by  changing  the  line-drawing  to  make  it  "look 
right."  Each  action  is  obvious  from  context,  leaving  the 
designer  free  to  concentrate  on  the  design  itself  and  not  how 
to  convoy  it  to  the  solid  modeler. 

This  ease  of  use  comes  without  sacrificing  any  of  the  ca¬ 
pabilities  inirinsic  to  solid  modeling  systems.  As  with  other 
solid  modeling  systems.  Viking  lets  the  designer  manipulate 
the  underlying  object  description  as  if  it  were  a  solid  ob¬ 
ject.  This  provides  the  designer  with  a  powerful  tool  for 
visualizing  an  object's  structure.  For  example,  the  designer 
can  wiggle  the  object  by  dynamically  changing  the  view 
transform  or  drag  a  translucent  cutting  plane  through  the  ob¬ 
ject  to  see  where  vertices  lie  with  respect  to  one  another  in 
three-dimensions.  And,  although  Viking's  user-interface  is 
based  primarily  on  sketching,  the  designer  can  create  pre¬ 
cisely  dimensioned  models  by  using  geomeulc  consu^nts. 
This  combination  of  sketching  and  solid  modeling  techniques 
creates  an  effective  user-interface  for  developing  ideas  into 
practical  designs. 
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ABSTRACT 

We  have  implemented  a  system  for  Computer>Aided  Plastic  Sur* 
geiy.  Planning  plastic  surgery  procecares  is  complex  because  the 
surgeon  needs  to  stretch  and  reshape  the  patient's  skin  to  replace 
missing  tissue  while  minimizing  distortion  of  the  surrounding  tis¬ 
sue,  Traditional  planning  techniques  rely  on  the  surgeon's  experi¬ 
ence  to  select  among  a  myriad  of  possible  procedure  designs. 
While  mathematical  techniques  for  predicting  the  outcome  of  sur¬ 
gery  have  been  proposed  in  the  past,  these  are  not  in  widespread 
use  by  surgeons  because  they  require  the  surgeon  to  perform  man¬ 
ual  constructions  and  geometric  calculations.  Our  system  makes 
the  analysis  process  easier  by  allowing  the  surgeon  to  draw  the 
surgical  plan  directly  on  a  3D  model  of  the  patient.  An  automatic 
mesh  generator  is  used  to  convert  that  drawing  into  a  well-formu¬ 
lated  problem  for  finite  element  analysis. 

K«y  Words 

Interactivity,  3D  Graphics.  Computer-Aided  Surgery,  Plastic  Sur¬ 
gery,  Surgical  Simulation. 

1.  INTRODUCTION 

This  paper  describes  our  experience  designing  a  Computer-Aided 
Plastic  Surgery  (CAPS)  system.  The  system  provides  surgeons 
with  a  computer  graphics  environment  in  which  they  can  explore 
the  biomechanical  implications  of  surgical  alternatives.  The  CAPS 
system  uses  a  combination  of  interactive  30  computer  graphics, 
automatic  mesh  generation  algorithms,  physically-based  modeling 
using  the  Finite  Element  Method,  and  animated  visualization  of 
the  surgical  result.  We  have  implemented  the  system  and  have  had 
it  evaluated  by  a  number  of  practicing  plastic  surgeons  with  very 
positive  results. 

Computerized  planning  represents  an  important  development 
for  plastic  surgeons  because  their  current  techniques  do  not  allow 
iterative  problem  solving.  Today,  a  surgeon  must  observe  and  per¬ 
form  many  operations  to  build  up  the  experience  about  the  effect 
of  changes  in  the  surgical  plan.  Each  of  these  operations  is  unique, 
and  it  is  difficult  to  isolate  the  effects  of  different  surgical  options 
since  the  result  is  also  influenced  by  many  patient  specific  vari¬ 
ables.  The  CAPS  system  allows  exploration  of  the  various  surgical 
alternatives  with  the  ability  to  modify  the  existmg  plan,  or  to  create 
a  new  plan  from  scratch.  This  process  may  be  repeated  as  many 
times  as  needed  until  the  surgeon  is  satisfied  with  the  plan. 
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In  our  view,  it  is  crucial  that  the  user  interface  to  the  system  not 
burden  the  physician  with  the  implementation  details  of  the  com¬ 
putational  mo^l  Specifically,  the  physician  should  not  be  required 
to  manipulate  points  and  polygons,  or  nodal  points  and  elements  of 
the  finite  element  model.  Our  work  follows  a  task-level  analy- 
sis[33]  of  the  goals  of  plastic  surgery:  in  this  system  the  surgeon 
only  deals  directly  with  the  problems  associated  with  the  task  — 
identifying  the  clinical  problem,  selecting  the  surgical  procedure  to 
apply,  and  specifying  the  execution  of  the  procedure.  All  other 
aspects  of  the  analysis  are  carried  out  automatically.The  interface 
to  the  CAPS  system  is  designed  to  simulate  the  process  of  drawing 
on  the  patient's  skin  with  a  marker,  as  is  done  when  the  surgery  is 
transferred  to  the  patient  in  the  operating  room. 

The  remainder  of  this  paper  describes  the  techniques  used  in  the 
implementation  of  the  CAPS  system.  This  is  motivated  by  a 
review  of  related  work  and  a  brief  discussion  of  the  goals  of  plastic 
surgery  and  the  problems  faced  by  the  clinician.  The  following 
sections  describe  the  simulation  model,  and  the  clinician's  inter¬ 
face  to  the  system.  We  then  look  in  detail  at  the  mesh  generation 
algorithms  that  convert  the  surgical  plan  into  a  well-formed  prob¬ 
lem  for  finite  element  analysis. 

2.  BACKGROUND 

Previous  work  has  concentrated  on  either  building  mathematical 
models  of  the  soft  tissue  mechanics  in  order  to  analyze  specific  test 
cases,  or  on  imaging  systems  that  present  renderings  of  volumetric 
scans  of  the  patient.  Our  work  is  an  attempt  to  bring  these  two 
components  together  with  a  powerful  user  interface.  This  results  in 
a  system  where  the  simulation  procedures  are  attached  to  tlie 
graphical  model  —  a  combination  which  allows  the  surgeon  to 
operate  on  the  graphical  model  in  a  manner  directly  analogous  to 
operating  on  the  real  patient.  This  approach  is  crucial  for  the  suc¬ 
cessful  clinical  application  of  mechanical  analysis  of  soft  tissue 
because  without  the  assistance  of  a  computer  grapliics  tool  the  sur¬ 
geon  has  neither  the  time  nor  the  training  to  fonnulate  a  specific 
surgical  case  at  the  level  of  detail  required  for  analysis. 

Mechanical  Analysis  of  Plastic  Surgery 
Previous  research  in  biomechanical  analysis  of  plastic  surgery  has 
not  included  methods  for  automatically  converting  a  surgical  plan 
into  a  form  appropriate  for  the  analysis  programs.  For  example,  in 
her  work  on  analysis  of  plastic  surgeries,  Deng  describes  a  system 
in  which  the  user  is  required  to  type  an  input  file  which  describes 
the  incision  geometries,  regions  of  tissue  to  simulate,  and  con¬ 
straint  conditions  on  the  tissue  m  tenns  of  their  world  space  coor¬ 
dinates!  11].  Kawabala  and  his  coworkers  describe  their  techniques 
for  analysis  of  surgical  procedures  but  report  no  method  for  auto¬ 
matically  generating  a  mesh  for  a  particular  planf  16].  Larrabee  dis¬ 
cusses  the  problem  of  modeling  arbitrary  incision  geometries 
using  graphical  input  devices,  but  the  solution  he  proposes  requires 
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the  user  to  define  each  of  the  dozens  of  analysis  nodes  and  ele- 
ments[18].  While  Lanabee’s  approach  is  useful  for  small  two- 
dimensional  analyses  (which  is  the  way  I.arrabee  used  it),  the 
approach  becomes  unmanageable  for  three-dimensional  structures 
with  a  greater  number  of  nodes.  The  user  interface  and  mesh  gen¬ 
eration  techniques  described  in  this  paper  begin  to  address  these 
three-dimensional  problems. 

Computer  Graphics  Models  of  Skin 
Waters  describes  a  system  based  on  the  for  simulating  the  expres¬ 
sive  action  of  facial  muscles  through  a  combination  of  pre-defincd 
action  units[30].  Waters  and  Terzopoulos  subsequently  extended 
this  technique  to  include  physically-based  dynamics  of  the  skin  in 
response  to  the  muscle  action[31].  However,  their  system  could 
not  be  used  directly  for  plastic  surgery  simulation  because  it  does 
not  support  cutting  and  suturing.  In  addition,  their  physical  model 
is  bas^  on  the  mass-and-spring  lattice  approach,  which  we  feel  is 
more  difficult  to  control  and  less  accurate  than  the  finite  clement 
method. 

Volumetric  Approaches 

Previous  computer  graphics  work  has  emphasized  special  purpose 
rendering  algorithms  for  visualization  of  data  obtained  from  volu¬ 
metric  scans  of  the  patienl(22:19;9).  or  geometric  methods  for 
extracting  and  repositioning  pieces  of  the  volume  data[7;28].  Our 
approach  differs  since  the  CAPS  system  integrates  a  biomechani¬ 
cal  simulation  with  a  graphic  presentation. 

Interactive  Computer  Graphics  for  Surgical  Simulation 
The  terms  surgical  siniulalionl24]  and  Computer-Aided  Sur- 
gery[21;5]  have  both  been  used  to  refer  to  the  combination  of 
physically  ba.sed  modeling  of  the  human  body  and  interactive 
computer  graphics  applied  to  planning  and  analysis  of  surgical 
procedures.  In  an  example  of  this  approach,  Delp  et  ai  have  cre¬ 
ated  a  system  for  simulating  tendon  transfer  operations  on  the 
lower  extremity!  10].  This  system  includes  a  geometric  model  of 
the  major  bones  of  the  hip  and  leg,  a  kinematic  model  of  six  joints, 
and  a  mechanical  model  of  43  muscle-tendon  actuator  units.  A  3D 
graphics  interface  can  be  used  to  select  and  move  tendon  attach¬ 
ment  points.  Thompson  el  al.  have  developed  a  similar  system  for 
hand  surgery(27].  Our  work  on  the  CAPS  system  is  most  similar  in 
spirit  to,  and  was  inspired  by  the  work  of  these  groups. 

3.  GOALS  OF  PLASTIC  SURGERY 

The  goal  of  plastic  surgery  is  to  create  a  proper  contour  by 
making  the  best  distribution  of  available  materials.  Opera¬ 
tions  lake  place  on  relatively  limited  surface  areas  and,  in 
local  procedures,  skin  cover  is  not  brougnt  from  distant 
areas.*  Rather,  skin  should  be  borrowed  and  redistributed 
in  the  area  where  the  operation  is  being  carried  out.  In  this 
way,  surgeons  should  be  able  to  perform  typical  plastic 
operations  that  will  restore  proper  form  to  distorted  sur¬ 
faces.  Different  maneuvers  are  used  in  various  combina¬ 
tions  as  either  simple  or  complex  figures.  The  location, 
form,  and  dimensions  of  the  incisions  necessary  for  plastic 
redistribiuion  of  tissues  determine  the  plan  of  the  opera¬ 
tion. 

A.  A.  Limberg,  M.D.[20| 

AppUcations  of  plastic  surgery  include  repairing  lesions  caused 
by  disease,  replacing  skin  lost  to  burns  or  amputations,  rebuilding 
features  misshapen  by  injury  or  birth  defects,  and  removing  excess 
tissue  to  reduce  the  visual  effects  of  aging]  13).  Tins  is  accom¬ 
plished  tluough  the  precise  application  of  surgical  techniques 
including  excision  (removalj  of  tissue,  direct  closure  of  a  wound 
site,  and  a  variety  of  flap  tr  msposition  and  rearrangement  surger- 


*■  In  contrast  to  skin  grafting  operations 


ies.  Each  of  these  results  in  a  redistribution  of  the  available  tissue 
and  requires  the  application  of  plastic  surgery  principles  to  pro¬ 
duce  the  optimum  contour. 

An  example  plastic  surgery  (simulated  on  the  CAPS  system)  is 
shown  in  figures  6  and  7.  This  procedure  combines  excision  of  a 
tumor  with  two  flap  transpositions.  The  flap  transpositions  have 
the  effect  of  using  tissue  from  the  area  surrounding  the  excision  to 
relieve  the  stress  caused  by  covering  the  wound.  The  resultant 
ejects  on  the  surrounding  tissue  contour  can  be  seen.  This 
includes  distortions,  redistribution,  and  standing  cones  (dog  ears) 
at  the  point  of  rotation  of  the  flaps.  The  CAPS  system  can  be  used 
to  compare  various  flap  transposition  and  excision  options,  and 
provides  an  environment  that  allows  the  surgeon  to  iteratively 
approach  the  planning  problem. 

4.  THE  PATIENT  MODEL 

The  model  of  the  patient  used  in  the  CAPS  system  is  a  combina¬ 
tion  of  patient  specific  geometric  data  and  a  generic  mechanical 
model  of  the  soft  tissue. 

Sources  of  Patient  Geometric  Data 
The  patient  specific  geometry  we  have  used  to  date  is  derived  from 
either  a  Cyberware  surface  scan  of  the  patient[8]  or  from  a  CT 
scan.  The  Cencit  scanner  system  is  also  a  promising  technology  for 
use  in  this  application[29].  The  mesh  generation  algorithms  make 
use  of  a  cylin^ically-mappcd  range  image  of  the  type  produced  by 
the  Cyberwarc  and  Ccncit  scanners,  in  order  to  create  a  solid 
model  of  the  skin,  our  current  system  assumes  a  constant  soft  tis¬ 
sue  thickness  when  working  witli  this  type  of  data.  Full  volumetric 
scans  (CT  or  MR  scans)  of  the  patient  provide  enough  information 
to  create  a  solid  model  with  the  appropriate  variation  in  soft  tissue 
thickness.  We  have  experimented  with  some  techniques  for  build¬ 
ing  models  directly  from  volumetric  scans[25|,  however,  we  feel 
that  the  surface  scanners  will  be  more  appropriate  for  use  in  plastic 
surgery  liecause  of  the  lime,  expense,  and  radiation  hazards  associ¬ 
ated  with  volumetric  scanners.  In  the  future  we  will  be  working  on 
techniques  for  creating  a  generic  map  of  facial  soft  tissue  thickness 
in  order  to  generate  more  accurate  solid  models  from  surface  scan 
data. 

Model  of  Soft  Tissue  Biomechanics 

The  finite  element  method  is  a  well  established  technique  for  bio¬ 
mechanical  analyses]  12]  and  provides  a  basis  for  detailed  model¬ 
ing  of  skin  nonlinearities]  11].  Finite  element  methods  can  also  be 
used  to  model  the  shape  changes  and  force  generating  properties  of 
other  parts  of  the  body,  such  as  the  musclcs]6].  Although  we  use  a 
relatively  simple  linear  solution  technique  in  the  CAPS  system,  the 
user  interface  and  mesh  generation  techniques  described  below 
can  be  used  directly  with  a  nonlinear  finite  element  back  end.  The 
finite  element  module  of  the  CAPS  system  uses  the  displacement- 
based  formulation  to  solve  the  elasticity  equilibrium  equations. 
Tlie  implementation  closely  follows  the  procedure  described  in 
Bathe]!].  Readers  are  referred  to  Bathe’s  excellent  text  for  further 
details  on  llie  implementation  of  finite  element  codes. 

Visualization  of  the  Finite  Element  Model 
Tlie  two  components  of  the  patient  model,  the  scan  of  the  patient 
and  the  finite  element  mesh,  exist  dilTereni  resolutions.  A  typical 
Cyberware  patient  scan  contains  512x256  range  and  color  sam¬ 
ples.  while  the  finite  element  meshes  we  can  easily  simulate  con¬ 
tain  only  50  elements,  with  each  element  covering  approximately  a 
square  inch  of  skin.  In  order  to  display  the  full  resolution  of  tlie 
original  scan  data  both  before  and  after  the  finite  element  solution 
(corresponding  to  pre-  and  post  operative  conditions),  we  use  tlie 
following  texture  and  displacement  mapping  technique.  First,  we 
subdivide  the  outer  face  of  each  element  into  micropolygons  (the 
outer  face  being  tlie  one  which  lies  on  the  skin  surface).  Tlie  posi¬ 
tion  oi  each  micropolygon  vertex  is  iransforined  back  into  cylin- 
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drical  coordinate  space,  and  the  0  and  z  coordinates  are  used  to 
sample  the  Cyberware  range  and  color  data  (a  bilinear  interpola¬ 
tion  is  used  to  sample  between  pixels).  The  color  value  is  stored  as 
the  vertex  color  of  the  micropolygon  vertex.  The  sampled  range 
point  is  transformed  back  into  cartesian  space  and  used  as  the  posi¬ 
tion  of  the  micropolygon  vertex.  The  user  can  select  the  number  of 
micropolygons  created  for  each  element  and  thus  can  visualize  the 
full  resolution  of  the  Cyberware  data.  We  maintain  a  data  structure 
for  each  micropolygon  vertex  in  which  we  store  the  vector  from 
the  point  on  the  surface  of  the  element  to  the  corresponding  posi¬ 
tion  on  the  range  data. 

This  vector  is  then  used  to  display  the  full  resolution  post-oper¬ 
ative  model.  The  output  of  the  finite  element  solution  is  a  set  of 
displacements  for  each  nodal  point  in  the  finite  element  mesh. 
These  nodal  displacements  are  interpolated  through  the  element  to 
define  a  displacement  vector  at  each  point  in  the  element.  Thus,  for 
each  micropolygon  vertex,  there  is  a  displacement  vector.  By  add¬ 
ing  the  finite  element  displacement  vector  to  the  range  data  dis¬ 
placement  vector,  we  can  generate  post-operative  images  using  the 
full  resolution  of  the  original  scan.  Images  generated  using  this 
method  are  shown  in  figure  7. 

5.  SPECIFYING  THE  PLAN 

The  heart  of  the  interactive  system  is  the  user  interface  which 
allows  the  surgeon  to  input  the  parameters  of  the  surgical  proce¬ 
dure.  For  this  task,  we  selected  an  interface  based  on  a  combina¬ 
tion  of  2D  and  3D  computer  graphics  techniques  using  the  X 
Window  System  with  the  Motif  toolkit,  and  on  a  set  of  3D  interac¬ 
tion  tools  built  on  top  of  the  Starbase  graphics  library  from 
Hewlett  Packard.  The  CAPS  system  is  built  on  top  of  the  bolio 
simulation  system[32].  The  clinician  is  presented  with  an  X  Win¬ 
dow  System  screen  containing  a  menu  bar  and  buttons,  and  a  3D 
graphics  window  showing  a  rendered  image  of  the  geometric 
model  of  the  patient.  The  user  controls  the  3D  view  of  the  patient 
model  and  modifies  other  rendering  parameters  using  the  mouse. 
The  user  interface  also  allows  the  surgeon  to  switch  between  the 
jxe-  and  post-operative  patient  geomeuy,  or  to  animate  the  transi¬ 
tion  between  them. 

Mouse  actions  are  used  to  select  points  on  the  rendered  image 
of  the  patient.  These  points  are  used  to  define  the  incision  lines  on 
the  skin  surface  and  the  tissue  to  be  excised.  Tlie  system  converts 
this  into  a  data  structure  fur  subsequent  use  by  the  mesh  generator. 

OpBrating  on  tho  Surfaco 

Planning  the  operation  on  the  skin  surface  requires  a  technique  for 
mapping  selections  on  the  screen  window  back  onto  ilie  surface  of 
the  object,  i.e.,  a  mouse  click  on  the  window  should  pick  a  point  on 
the  patient  model  which  appears  directly  beneath  the  mouse  loca¬ 
tion.  For  use  in  their  3D  object  painting  system,  Hanrahan  and 
Haeberli  describe  a  technique  for  hardware-assisted  calculation  of 
this  location  that  makes  use  of  an  object  ID  buffer!  13). 

Since  our  graphics  hardware  did  not  support  this  feature,  we 
implemented  this  operation  with  ray  tracing  as  follows.  A  ray  is 
cast  from  the  view  point  to  tlie  selected  point  on  the  view  plane 
and  IS  intersected  with  a  polyhedral  reconstruction  of  the  scan  data. 


The  polyhedron  is  created  by  making  vertices  at  the  scan  data  sam¬ 
ple  points  (transformed  from  source  data  space  to  world  coordi¬ 
nates)  and  connecting  each  set  of  four  adjacent  vertices  with  a 
polygon.  This  operation  requires  checking  the  ray  against  each  of 
the  polygons  in  the  polyhedral  reconstruction.  To  reduce  the  num¬ 
ber  of  polygons,  a  filtered  version  of  the  source  data  is  used.  The 
operation  could  be  made  more  efficient  with  octree  sorting  of  the 
polygons  or  other  ray  tracing  optimizations.  It  turned  out  that  we 
did  not  need  to  explore  this  since  the  point  is  picked  on  a  2D 
image,  and  feedback  can  be  given  instantly  when  the  button  is 
pressed;  the  system  can  then  be  calculating  the  3D  intersection  in 
the  background  while  the  user  is  selecting  the  next  point. 

After  a  set  of  points  on  the  surface  is  created,  it  is  useful  to  be 
able  to  pick  a  point  by  clicking  the  mouse  on  that  point.  Again,  we 
chose  a  ray  tracing  approach  to  select  the  nearest  point  to  the  ray 
from  the  view  point  through  the  picked  point  on  the  view  plane. 

Defining  a  Hole:  Incision 

An  incision  through  the  skin  is  topologically  a  hole,  but  geometri¬ 
cally  it  is  infinitesimally  thin  until  it  is  deformed  by  the  mechanical 
simulation.  Rather  than  requiring  the  user  to  draw  a  hole  by  enter¬ 
ing  the  points  on  both  sides  of  the  incision,  the  incision  is  entered 
by  picking  a  sequence  of  poinu  corresponding  to  the  cutting  path 
of  the  scalpel.  This  list  of  points  is  then  converted  into  a  loop  of 
points  describing  the  hole.  Figure  2  illustrates  this  mapping.  The 
points  are  entered  by  selecting  locations  on  the  skin  surface  using 
the  screen  space  to  skin  surface  transformation  described  in  the 
previous  section.  The  incision  line  can  be  modified  1^  picking  one 
of  the  points  and  moving  it. 
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Figure  2.  Ibis  figure  shows  ihc  relationship  between  the  cutting 
path  entered  by  the  surgeon  and  the  boundary  of  the  incision 
hole.  The  surgeon  selecu  the  points  0, 1,  2,  and  3  to  define  a 
simple  Z-plasty  incision.  The  system  adds  points  4  and  S  coinci¬ 
dent  with  points  2  and  I .  Tbe  boundaiy  of  the  hole  is  then  stored 
as  the  ordered  list  0,  1.  2.  3,  4,  S.  Note  that  no  tissue  was 
removed  in  the  incision  shown.  Tissue  could  be  removed  by 
interactively  picking  and  moving  the  points  1, 2, 4,  or  S  in  onler 
to  enclose  the  tissue  to  remove  within  the  hole  boundaiy. 

Modifying  the  Hole:  Excision 

An  excision  of  tissue  is  defined  by  picking  one  of  the  points  in  the 
hole  border  and  offsetting  it  from  its  corresponding  point  on  the 
other  side  of  the  hole,  with  the  result  that  the  hole  is  no  longer 
infinitesimally  thin.  Moving  one  border  point  creates  a  quadrilat¬ 
eral,  while  moving  more  than  one  creates  an  arbitrary  polygonal 
shape.  A  simple  point  picking  algorithm  cannot  be  used  for  this 
picking  operation  because  the  two  points  on  either  side  of  the  hole 
are  coincident.  A  modified  algorithm  could  be  devised  to  distin¬ 
guish  between  coincident  points  by  determining  on  which  side  of 


Figure  1.  This  figure  shows  the  node 
numbering  and  pattern  for  an  ellipti¬ 
cal  excision,  both  before  and  after 
wound  closure.  The  surgeon  origi¬ 
nally  enters  the  points  0, 1 , 2,  and  3. 
Tbe  system  then  adds  points  4  and  S, 
initially  coincident  with  2  and  3  Tbe 
surgeon  then  moves  points  4  and  S 
to  enclose  the  excision  region 
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Af.ihcisibh  line  tJie  user  picks.  In  our  current  prototype  a  menu 
:sei<«tion  is  used  to  indicate  the  point  to  be  moved. 

Ch^ng  the  Hole:  Suturing 

Sitting  refers  to  the  sewing  together  of  edges  of  the  incision.  In 
^Vfimte  element  simulation,  this  is  accomplished  by  suture  con¬ 
straint  .equations  for  the  individual  nodes  in  the  continuum  mesh. 
Even  for  a  simple  wound  closure,  dozens  of  pairs  of  nodes  must 
constrained  together  in  order  to  suture  the  entire  wound.  Selecting 
each.'ft^  of  no^s  by  hand  would  be  unnecessarily  tedious, 
Ihstu^  the  continuum  mesh  generator  automatically  creates  a  list 
of  n^es  to  be  sutured  &om  a  description  of  which  edges  of  the 
hole  border  are  to  be  brought  together.  Figure  1  shows  the  pre-  and 
post-operative  topology  desired  for  a  simple  excision.  For  this  con¬ 
figuration,  the  edge  sutures  are  specified  as  ((0,1),  (0,5)),  ((1,2), 
(5,4)),  md  ((2,3),  (4,3)),  When  the  same  point  is  included  in  both 
of  the  edges  to  be  sutured,  the  mesh  generator  recognizes  this  as  a 
comer  being  closed  and  does  not  define  any  sutures  for  the  nodes 
corresponding  to  that  point.  The  suture  edges  for  the  Z-plasty 
shown  in  figure  2  are  ((5, 0),  (5, 4)),  ((2, 1),  (2, 3)),  and  ((0, 1),  (3, 
4)).  In  the  CAPS  system,  the  suture  edges  are  specified  by  select¬ 
ing  a  menu  item  corresponding  to  the  type  of  surgical  procedure 
being  performed  (e.g.  elliptical  excision  or  Z-plasty).  This  tech¬ 
nique  works  because  the  suture  relationships  depend  only  on  the 
pre-defined  topology  of  the  procedure  and  not  the  interactively 
specified  geometry.  The  menu  item  approach  has  the  advantage 
that  the  suture  conditions  do  not  need  to  be  re-entered  for  each 
simulation  of  the  same  surgical  procedure. 

The  drawback  of  this  menu-based  approach  is  that  in  order  to 
simulate  a  new  procedure,  the  suture  relationships  described  above 
must  be  worked  out  by  hand  and  added  to  the  user  interface  config¬ 
uration  file.  While  this  is  not  a  very  difficult  task,  a  more  flexible 
solution  would  be  to  allow  the  user  to  define  the  suture  relation¬ 
ships  by  selecting  pairs  of  wound  edges.  The  system  could  differ¬ 
entiate  between  coincident  edges  by  detennining  which  side  of  the 
incision  line  the  user  picked.  Picking  edges  in  the  proper  sequence 
would  then  define  the  suture  relationships  for  the  surgical  proce¬ 
dure,  These  suture  relationships  could  then  be  added  to  the  menu 
for  use  in  future  analyses. 

6.  MESH  GENERATION 

Tbe  surgical  plan  is  entered  in  the  CAPS  system  using  a  graphical 
interface  which  conesponds  to  the  way  the  surgeon  ^aws  on  the 
patient's  skin  in  the  operating  room.  An  important  part  of  this 
interface  is  the  mesh  generator,  which  creates  a  well-formed  finite 
element  mesh  directly  from  the  surgical  plan  and  the  original  scan 
of  the  patient  geometry. 

The  mK,h  generation  algorithm  consists  of  two  major  steps:  sur¬ 
face  meshing  and  continuum  meshing.  The  surface  meshing  por¬ 
tion  of  the  algorithm  grows  a  mesh  out  from  the  incision  hole 
border  along  the  skin  surface.  Surface  meshing  is  performed  m  a 
normalized  cylindrical  space  ignonng  the  r  (radial)  coordinate. 
After  the  surface  mesh  is  generated,  the  mesh  is  snapped  back  to 
the  skin  surface  by  looking  up  the  r  coordinate  in  the  Cyberware 
range  data. 

The  continuum  meshing  portion  of  the  algorithm  refers  lo  the 
process  of  creating  a  continuum  finite  element  mesh  representmg 
the  skin  thickness.  This  is  accomplished  by  growing  the  surface 
mesh  radially  in  from  the  skin  surface  to  the  bone  surface  along  the 
r  axis.  Triangles  are  extruded  into  wedge  elements  and  quadrilater¬ 
als  are  extruded  into  cuboid  elements.  Edges  shared  by  polygons  in 
the  surface  mesh  are  extruded  into  shared  faces  m  the  contmuum 
mesh.  Each  vertex  in  the  surface  mesh  defines  a  set  of  nodes  m  the 
continuum  mesh  which  lie  along  the  line  from  that  vertex  to  the 
central  axis  of  the  cylindncal  space  of  the  patient  scan  data.  Note 
that  this  extrusion  process  assumes  that  the  incision  cub  into  tiie 
skin  along  the  r  axis. 
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Figure  3.  A  (Uiftce  meth  gcnenicd  from  a  Z-ptatly  incision.  The  original  inci¬ 
sion  tines  are  indicated  in  bold.  The  fint  suge  of  the  surface  meshing  algorithm 
Inivencs  the  bonier  of  the  incision  hole  and  identifiw  the  two  concave  regions 
which  become  surface  meth  polygons  1  and  2.  The  second  stage  of  the  algorithm 
adds  polygons  3, 4, 5, 6, 7,  and  S.  Polygons  5  and  S  result  from  veitices  that  were 
“expanded"  because  they  meet  at  loo  sharp  an  angle. 

Figure  4  shows  a  cross  section  of  the  nodes  rmd  elements  cre¬ 
ated  by  the  continuum  mesh  algorithm.  Heavy  lines  are  edges  firom 
the  surface  mesh,  and  filled  circles  are  nodes  from  the  surface 
mesh. 

A  suture  condition  specified  between  two  edges  on  the  incision 
boundary  is  converted  into  suture  constraints  between  each  pair  of 
nodes  generated  from  those  edges.  Nodes  on  the  bottom  layer  of 
the  continuum  mesh  wliich  do  not  have  suture  constraints  are 
marked  as  fixed  in  all  three  degrees  of  freedom.  All  other  nodes  in 
the  continuum  mesh  are  unconstrained. 

Surface  Meshing  Algorithm 

The  surface  meshing  approach  used  in  the  CAPS  system  is  based 
on  the  automatic  mesh  generation  work  of  Chae  and  Bathe[3;4]. 
Their  algorithm,  which  addresses  the  problem  of  automatic  mesh¬ 
ing  of  CAD  parts  such  as  a  plate  with  holes  drilled  in  it,  works  by 
creating  layers  of  elements  along  the  borders  of  the  object  and 
working  inward  until  the  rows  meet.  We  have  modified  this 
approach  to  work  outward  from  the  incision  boundary  hole  and 
have  made  the  algorithm  create  quadrilateral  elements  wherever 
possible. 

Our  algorithm  consisu  of  two  stages;  1)  Traverse  the  border  of 
incision  looking  for  angles  larger  than  a  set  threshold  convert 
them  to  triangles  in  the  surface  mesh  and  update  the  border.  This 
process  continues  until  no  more  angles  need  to  be  filled.  2)  Go 
around  ihc  border  adding  a  layer  of  Quadrilaterals  of  thickness  a 
quadrilateral  is  added  for  each  edge  in  the  border,  and  an  extra 
quadrilateral  is  added  at  edges  which  join  at  an  angle  less  than  a 
specified  threshold  fj. 

Stage  1  is  implemented  as  follows.  For  each  vertex  v,  in  the  bor¬ 
der  list,  examine  the  angle  between  the  edges  (v,,  v,^i)  and  (v,+y, 
v,^>).*  If  this  angle  is  greater  than  t/,  add  triangle  (v,+2,  v,,/,  v,)  to 
tlie  surface  mesh  (30°  is  the  default  /y  threshold  angle  in  the  proto 
type)  and  delete  vertex  from  the  border  list.  Continue  this  pro 
cess  until  no  more  triangles  are  added  in  a  complete  traversal  of 


130 


the  Mfder  list  After  stage  1,  the  region  defined  by  the  border  list 
will  be  nearly  convex  (no  concavities  will  be  greater  than  tj). 

Stage  2  has  tvto  su1»tages;  creating  the  new  border  list  and  join¬ 
ing  the  new  and  old  border  lists  with  quadrilaterals.  The  first  sub¬ 
stage  proce^  as  follows.  Create  an  empty  list  to  store  the  new 
bbSer.  For  each  vertex  v,-  in  the  current  border  list,  let  M;  be  the 
outw:;rd  normal  from  edge  (v,-.;,  v,)  and  be  the  outward  normal 
fiBm  edge  (v,-,  Examine  the  angle  between  the  edges  (v,-.;,  v,) 
and  (v/,  If  the  angle  is  greater  than  (2  then  add  a  vertex  to  the 
new  border  with  vertex  position  of  v/  + 1/  {rt]  +  «2)*  If  the  angle  is 
less  than  then  mark  v;  as  expanded,  and  add  three  vertices  to  the 
new  border  with  vertex  positions  of  v,-  +  /; «;,  v,.  +  /;(«;  +  n2),  and 

Vi  +  ljlli. 

The  second  substage  of  stage  2  is  to  connect  new  and  old  border 
lists  with  quadrilater^s  as  follows.  Lety  index  the  new  border  list 
and  i  index  current  border  list;  initialize  t  and  j  to  zero.  For  each 
vertex  vy,  if  v;  is  marked  as  expanded,  add  quadrilateral  (v,-,  vj,  vj^j, 
Vy^2)i  increment  j  by  two.  Add  quadrilateral  (v,-.  vy,  Vj^j, 
Increment  /  and;  by  one.  Make  the  new  border  the  current  border. 
The  entire  stage  2  process  is  repeated  once  for  each  layer  to  be 
added  to  the  s^ace  mesh.  Figure  3  shows  the  surface  mesh  gener¬ 
ated  for  a  Z-pIasty  incision. 

Continuum  Meshing  Algorithm 

Generation  of  the  continuum  mesh  from  the  surface  mesh  is 
accomplished  by  extruding  the  surface  mesh  inward  along  the  r 
axis  to  form  solid  elements  and  then  making  a  mapping  from  verti¬ 
ces  and  polygons  in  the  surface  mesh  to  nodes  and  ele.ments  in  the 
continuum  mesh.  First  we  look  at  the  numbering  of  nodes  in  the 
standard  isoparametric  element,  then  we  look  at  the  numbering  of 
the  vertices  and  edges  in  the  surface  mesh,  and  then  at  the  corre¬ 
spondence  between  these  numbering  schemes.  The  continuum 
meshing  algorithm  converts  the  surface  mesh  into  an  arbitrary 
number  of  layers  of  elements,  each  layer  being  of  an  arbitrary 
thickness. 

Figure  S  shows  the  standard  finite  element  used  in  the  CAFS 
system.  The  algorithm  must  generate  elements  with  the  proper 
(wde  ordering.  Nodes  0-3  called  the  top_nodes,  are  the  comers  of 
face  0;  nodes  4-7,  called  the  bottom.iK^es,  are  the  corners  of  face 
1;  nodes  8-11,  called  the  top_mid_nodes  are  the  nodes  in  the  mid¬ 
dles  of  the  edges  on  the  top  face;  nodes  12-lS,  called  the 
bottom_mid„nodes  are  the  nodes  in  the  middle  of  the  edges  on  the 
bottom  face;  nodes  16-19,  called  the  center.nodes  are  the  nodes  in 
the  center  of  the  edges  joining  face  0  to  face  1 . 

In  the  surface  mesh  we  have  a  set  of  vertex  points  connected  by 
a  set  of  polygons.  Each  polygon  has  a  list  of  the  vertices  which 
defines  its  shape.  An  edge  of  the  polygon  is  defined  by  each  pair  of 
vertices  in  the  list  and  by  the  last  and  first  vertices  in  the  list.  A 
data  structure  is  maintained  for  each  layer  of  elements  which  keeps 
track  of  the  numbering  of  nodes  in  the  layer.  As  each  node  is  cre¬ 
ated,  its  position  is  calculated  and  its  indei'  'n  list  of  nodes  for 
the  structure  is  recorded  in  the  layer  dt. 

For  the  top  layer  of  elements,  the  top_nodei  are  positioned  at 
the  points  of  the  surface  mesh  vertices,  The  positions  of  the 
top_mid_nodes  of  the  top  face  are  calculated  by  taking  the  mid¬ 
points  of  each  polygon  ^ge  and  offsetting  those  points  to  lie  on 
the  skin  surface.  The  positions  of  the  bottom_nodes  are  calculated 
by  offsetting  the  positions  of  the  top.nodes  in  r  by  the  tliickness  of 
the  layer.  The  positions  of  the  bottom_t. '  .nodes  are  calculated  by 
offsetting  the  positions  of  tlie  top_mid_nodes  by  the  thickness  of 
the  layer.  The  positions  of  the  center_nodes  are  calculated  by  off¬ 
setting  the  positions  of  the  top_nodes  by  one  half  the  layer  thick¬ 
ness.  For  continuum  meshes  with  more  than  one  layer  of  elements 


*  Accesses  to  veitices  m  the  node  list  wrap  around  if  ihe  t*n  is  greater 
than  Ihe  length  of  the  list  Similarly,  negative  indices  nrap  back  to  the 
end  of  the  list. 
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Figure  4.  TWo  r'emenu  from  »  contimium  mesh,  this  shows  the  relationship 
between  the  surface  mesh  polygons  (correaponding  to  the  top  faces  shown  In 
bold)  and  the  continuum  elements,  the  continuum  mesh  algorithm  generate* 
elcmenta  cstnided  along  Ute  r  (into  the  skin)  following  Ihe  topology  defined 
by  the  surface  mesh.  Ihe  bottom  layer  of  nodes  are  consliaincd  to  remain 
fited  to  represent  the  bony  support.  Ihe  figure  shows  a  single  layer  of  20 
node  elements. 


Frgure  5  Node  numberuig  for  the  standard  20  node  isopaiamelnc  element 
used  in  the  CAPS  system. 


in  the  r  direction,  subsequent  layers  of  elements  are  generated  in 
an  analogous  manner  with  the  exception  that  ratlier  titan  creating 
new  nodes  for  the  top_nodes  and  the  top_mid_nodes,  the  indices 
of  the  previous  layer's  bottom_nodes  and  bottom_mid_nodes  are 
copied  instead. 

Once  all  the  nodes  have  been  created,  the  elements  must  be  cre¬ 
ated.  One  element  per  layer  is  created  for  each  polygon  in  tlie  sur- 
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face  mesh.  These  elements  must  contain  a  correctly  ordered  list  of 
the  node  indices.  This  list  of  indices  for  the  top_nodes  is  obtained 
ly  looping  through  the  vertices  of  the  polygon  and  looking  up  the 
node  indices  from  the  data  structure  of  the  layer  corresponding  to 
the  top  the  element.  The  indices  for  the  bottom_nodcs  and  the 
center_nodes  are  obtained  in  the  same  manner,  but  using  the 
appropriate  node  indices  hrom  the  layer  data  structure.  The  list  of 
intfces  for  the  top_mid_nodes  and  the  bottom_mid_nodes  arc 
found  by  looping  over  the  edge  list  for  each  polygon  and  finding 
that  edge's  index  in  the  list  of  edges  for  the  surface  mesh;  that 
index  is  then  used  to  find  the  appropriate  node  index  by  looking  up 
the  node  in  the  appropriate  layer  data  structure. 

Triangles  in  the  surface  mesh  are  handled  as  a  special  case  by 
creating  wedge  shaped  elemenU.  This  can  be  accomplished  by  col¬ 
lapsing  one  of  the  side  faces  of  the  isoparametric  element.  In  this 
case,  only  IS  nodes  are  created  for  the  element,  and  a  shared  node 
index  is  used  for  nodes  2,  10,  and  3,  for  nodes  18  and  19,  and  for 
nodes  6, 14,  and  7. 

7.  RESULTS 

To  date  the  system  has  been  used  in  two  ways.  We  have  been  able 
to  use  the  system  to  simulate  a  number  of  plastic  surgeries  of  the 
face  and  have  obtained  good  visual  match  tetween  the  simulation 


results  and  post-operative  photos  of  actual  patients.  In  addition,  we 
have  shown  the  system  to  over  a  dozen  practicing  plastic  surgeons 
and  have  obtained  very  positive  feedback.  Surgeons  have  noted, 
for  example,  that  this  system  is  completely  different  than  any  cur¬ 
rent  form  of  surgery  planning  because  it  contains  an  actual  model 
of  the  elasticity  of  the  skin.  This  critical  feature  is  missing  from 
most  current  planning  techniques  such  as  drawings  or  paper  mod¬ 
els.  The  other  planning  techniques  which  do  have  some  model  of 
skin  elasticity  (namely  cadavers  or  animal  models)  do  not  allow 
easy  iterative  design  of  the  procedure. 

8.  Future  Work 

Physical  modeling  of  human  soft  tissue  presents  many  challenges 
which  can  only  be  addressed  by  making  simplifying  assumptions 
about  the  behavior  of  the  tissue.  The  complexity  of  the  tissue 
includes  the  fact  that  it  is  alive,  that  it  has  a  complex  structure  of 
component  materials,  and  that  its  mechanical  behavior  is  nonlin¬ 
ear!  17:23],  The  design  of  the  CAPS  system,  we  have  attempted  to 
model  those  features  of  the  tissue  which  have  direct  bearing  on  the 
outcome  of  plastic  surgery,  but  in  doing  so  it  ignores  the  following 
eflects;  the  physiological  processes  of  healing,  growth,  and  aging 
are  not  included  in  the  model;  the  multiple  layers  of  material 
which  make  up  the  skin  are  idealized  as  a  single  elastic  continuum; 
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Figure  6.  A  screen  image  of  the  CAPS  system  m  operation  showing  ihepaueni  model  and  the  interuLtiscly  defined  surgn-al  plan 
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and  the  system  uses  only  a  linear  model  of  the  mechanical  behav¬ 
ior  of  the  tissue  and  does  not  include  a  model  of  the  pre-stress  in 
the  tissue  (i.e.,  the  skin  does  not  open  up  when  cut).  Under  these 
assumptions,  the  model  gives  an  estimate  of  the  instantaneous  state 
of  the  tissue  after  the  procedure  has  been  performed. 

These  assumptions  could  be  relaxed  to  build  a  more  complete 
model  of  tissue  behavior.  The  complex  structure  of  the  tissue  could 
be  addressed  by  creating  a  more  detailed  finite  element  mesh  with 
multiple  layers  of  differing  material  properties.  The  nonlinear 
mechanical  response  of  the  tissue  could  ^  better  approximated 
using  a  nonlinear  finite  element  solution  technique.  Both  of  these 
improvements  will  make  the  solution  process  more  computation¬ 
ally  complex,  but  will  become  more  feasible  as  computers  become 
faster.  We  plan  to  perform  a  series  of  clinical  trials  to  identify  the 
parameters  which  have  the  most  influence  in  the  surgical  result  and 
to  obtain  accurate  estimates  of  the  elastic  and  viscous  moduli  of 
the  soft  tissue. 

The  incorporation  of  physiological  processes  presents  a  more 
fundamental  problem,  since  the  processes  themselves  are  not  well 
understood.  In  this  realm,  the  physical  modeling  approach  offers  a 
possible  method  for  detennining  the  action  of  these  processes.  For 
example,  if  the  physical  model  is  calibrated  such  that  it  gives  a 
nearly  exact  prediction  of  the  immediate  post-operative  state  of  the 


tissue,  then  subsequent  changes  in  the  patient’s  skin  due  to  healing 
could  be  determined  by  changing  the  material  property  assump¬ 
tions  of  the  model  until  it  again  matches  the  skin.  It  is  possible  that 
this  analysis  would  lead  to  a  method  of  predicting  the  effect  of 
healing  which  could  then  be  included  in  the  planning  system. 

The  field  of  plastic  surgery  simulation  is  still  very  new  and 
there  are  many  promising  directions  for  future  work.  For  example, 
more  work  is  needed  to  improve  modeling  of  the  soft  tissue  to 
more  accurately  model  its  nonlinear  mechanical  response  and  its 
long  term  physiological  changes.  In  the  future,  we  would  also  like 
to  see  improved  user  interface  techniques  to  give  the  surgeon  more 
control  over  the  direction  and  depth  of  the  incisions.  Ihe  current 
incision  technique  is  adequate  for  planning  surface  incisions,  but 
cannot  be  used  for  internal  surgery. 

9.  CONCLUSIONS 

Simulation  of  plastic  surgery  presents  many  challenging  problems 
which  can  be  addressed  by  interactive  3D  graphics  techniques. 
Each  patient  presents  the  surgeon  with  a  unique  set  oi  problems  for 
which  there  are  many  possible  courses  of  action.  The  surgeon’s 
goal  is  to  optimize  the  rearrangement  of  tissue,  to  correct  the  tissue 
deficiency,  and  to  minimize  distortion  of  the  surrounding  tissue. 


Figure  7.  A  screen  unage  of  the  CAPS  sjslcm  tn  openliun  showing  the  siniulaled  results  of  the  uperetian 
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The  surgical  plan  must  take  into  account  the  complex  geometry 
and  mechanical  behavior  of  the  soft  tissue. 

In  this  paper  we  have  shown  how  a  task  level  analysis  of  the 
plastic  surgeiy  planning  problem  has  guided  our  development  and 
implementation  of  a  computer-aided  plastic  surgery  system.  The 
user  interface  techniques  and  mesh  generation  algorithms  we  have 
presented  directly  address  the  requirements  of  the  task  without 
burdening  the  surgeon  with  the  implementation  details  of  the  finite 
element  model.  Our  approach  has  been  well  received  by  clinicians, 
who  report  that  they  would  be  comfortable  using  this  system  to 
plan  operations.  However,  before  we  take  that  step,  we  will  be  put¬ 
ting  the  software  through  a  series  of  ''finical  triids  to  validate  the 
simulation  results  through  retrospective  analysis  of  case  histories. 
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Abstract 

3dm  is  «  thTce  dimensional  (3D)  surface  modeling 
program  that  draws  techniques  of  model  manipulation  from 
both  CAD  and  drawing  programs  and  applies  them  to 
modeling  in  an  intuitive  way.  3dm  uses  a  head-mounted 
display  (HMD)  to  simplify  the  problem  of  3D  model 
manipulation  and  understanding.  A  HMD  places  the  user  in 
the  modeling  space,  making  three  dimensional  relationships 
more  understandable.  As  a  result,  3dm  is  easy  to  learn  how  to 
use  and  encourages  experimentation  with  model  shapes. 

1  Introduction 

The  use  uf  interactive  3D  environments  has 
increased  the  demand  for  complex  3D  inodels.(9]  The  3D 
environments  that  provide  a  sense  of  telepresence  or  “virtual 
reality"  require  a  large  number  of  models  in  order  to  give  the 
user  the  illusion  of  Iwing  in  a  specific  place.  This  demand  for 
more  models  has  highlighted  the  fact  that  most  modeling 
systems  are  difficult  to  use  for  all  but  a  small  number  of 
experts.[9]  Through  identification  and  removal  of  some  of 
the  fundamental  obstacles  to  modeling  we  hope  to  make  it 
accessible  to  more  users. 

Typical  techniques  used  to  select  and  display 
objects  are  a  major  hindrance  to  3D  modeling.[3]  To  place  an 
object  in  3D  requires  six  parameters:  the  position  (three)  and 
the  orientation  (three).  Most  modeling  systems  (modelers) 
must  settle  for  a  2D  mouse  augmented  by  a  keyboard  for  this 
purpose.  This  mismatch  results  in  difficult  placement  and 
picking  of  objects  in  modeling  space.  The  display  of  models 
usually  takes  the  form  of  a  projection  onto  a  2D  monitor. 
This  has  the  effect  of  making  spatial  relationships  unclear. 
Technological  improvements  to  3D  model  display  and 
manipulation  hardware  can  remove  these  barriers  to  model 
creation  and  understanding. 

Cunent  virtual  reality  technology  provides  one 
iiolution  to  more  intuitive  modeling.  A  HMD  system  gives 
ihc  ability  to  understand  comple.\  spatial  relationships  of 
models  by  placing  the  user  in  the  model’s  world.  Within  this 
type  of  system,  a  hand-held  pointing  device  supplies  users 
with  the  ability  to  specify  3D  relationships  through  direct 
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3D  manipulation.  As  a  result,  the  user  can  build  the  virtual 
world  from  within  the  virtual  world. 

Our  source  of  inspiration  for  designing  a  user 
interface  for  a  HMD-based  modeler  is  the  current  software 
used  for  2D  modeling.  At  one  time,  creating  2D  models 
required  cumbersome  CAD  programs.  This  software  took  a 
long  time  to  learn  and  often  did  not  provide  real-time 
interaction.  Now,  however,  2D  drawings  can  be  manipulated 
by  even  the  most  casual  users  of  personal  computers.  This 
revolution  is  in  part  tlie  result  of  intuitive  drawing  programs 
like  MacDraw.  One  of  the  keys  to  MacDraw's  success  is  its 
inherent  simplicity.  Most  work  done  with  it  requires  no 
reading  or  use  of  the  keyboard.  Rather,  it  provides  a  palette 
of  tools  which  is  always  available  next  to  the  model.  To 
change  modes,  the  user  s'mply  selects  the  tool  from  the 
palette  using  the  mouse.  The  process  of  3D  modeling  can 
become  more  accessible  if  some  of  the  lessons  learned  from 
this  evaluation  of  2D  modeling  can  be  applied  to  3D 
modeling  systems. 

This  paper  presents  a  HMD-based  system  called 
3dm  which  simplifies  the  task  of  3D  modeling  by 
implementing  the  concepts  introduced  above.  :-asic 
techniques  for  working  within  3dm's  virtual  world  arc 
described  to  show  how  users  access  the  various  features.  The 
implementation  of  3dm  is  described  through  a  presentation 
of  its  most  useful  commands.  Finally,  the  results  of  actually 
using  3dm  arc  presented  with  an  emphasis  on  new  teclmiques 
that  can  be  applied  within  other  virtual  worlds. 

2  Prior  Work 

A  largo  body  of  work  has  been  done  on  3D 
modeling.  Although  3D  input  devices  have  been  used  to 
enhance  modelers,  very  little  modeling  has  been  done  with  a 
HMD.  Some  examples  of  modeling  with  six  degrcc-of- 
freedom  input  devices  arc  |1]  and  [8],  but  both  of  those  used 
traditional  2D  displays.  Previous  uses  of  HMD  systems  have 
concentrated  more  on  exploration  of  virtual  worlds  rather 
than  creating  or  modifying  them.  Some  examples  of  this 
work  with  HMD's  can  be  found  in  [5). 

Modeling  using  a  HMD  system  has  been  explored 
by  Clark. [4]  Users  of  Clark’s  system  created  parametric 
surfaces  by  manipulating  control  points  on  a  wire-frame  grid. 
Tilts  system  highlighted  the  utility  of  using  a  HMD  for 
improved  understanding  and  interaction  with  models.  Like 
Clark's  system.  3dm  relies  on  a  HMD  to  help  simplify 
modeling,  but  3dm's  intuitive  user  interface  design  also 
makes  it  easy  to  learn  and  use. 
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3  Implementation 

3dm  was  developed  using  a  VPL  eyephone  as  the 
display  device  and  Polhemus  trackers  to  track  the  head  and 
hand.  A  6D  2-butlon  mouse,  developed  at  UNC-CH,  was  the 
input  device.  The  images  were  rendered  using  the 
Pixel-Planes  4  and  Pixel-Planes  5  high-performance  graphics 
engines  developed  at  UNC-CH, [61(7]  Cunently,  all  models 
created  with  3dm  arc  made  up  of  hierarchical  groups  of 
triangles. 

3  - 1  User  Interface 

In  addition  to  the  model,  the  virtual  world  of  3dm 
contains  the  components  of  the  user  interface.  The  most 
important  of  these  arc  the  toolbox  and  the  cursor.  The  cursor 
follows  the  position  of  the  hand-held  mouse,  giving  the  user 
a  sense  of  hand  position  in  the  modeling  space.  The  toolbox 
is  the  means  by  which  most  actions  are  performed. 

Some  of  the  user  interface  components  are  simply 
helpful  markers  that  can  be  turned  off,  unlike  the  toolbox  and 
the  cursor,  which  are  always  visible.  The  user  stands  on  a 
“magic  carpet"  which  marks  the  boundaries  of  where  the 
tracking  system  operates.  Remaining  within  tracker  range  is 
important  because  the  virtual  world  will  begin  to  tilt  as  the 
user  moves  farther  out  of  range.  Below  the  magic  carpet  lies  a 
checkered  ground  plane,  above  which  the  model  is  usually 
created.  Additional  reference  objects,  such  as  coordinate 
axes,  can  be  turned  on  by  the  user. 
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Figure  1:  Hie  toolbox  as  seen  by  the  user. 

The  toolbox  initially  appears  suspended  in  space 
near  the  user’s  waist,  but  it  can  be  moved  to  a  more 
convenient  location.  The  toolbox  remains  attached  to  the 
user  as  he  or  she  moves  around  the  modeling  space,  or  it  can 
be  discoiuiected  and  left  anywhere  above  the  magic  carpet. 
The  toolbox  is  organized  into  cells  contair.ing  3D  icons. 
Each  icon  represents  either  a  tool,  a  command,  or  a  toggle. 
Many  of  these  icons  can  optionally  appear  in  pulldown 
menus  at  the  top  of  the  toolbox  in  order  to  reduce  clutter. 

Icons  perform  actions  when  tliey  are  selected  with 
the  cursor.  Tools  change  the  cunent  mode  of  operation  as 
reflected  in  the  shape  of  the  cursor.  For  instance,  when  die 
user  reaches  into  the  toolbox  and  selects  the  flying  tool,  the 
cursor  takes  the  form  of  an  airplane.  Selecting  a  command 
performs  a  single  task  v^ithout  changing  ihe  current  mode  of 


operation.  Toggles  change  some  global  aspect  of  3dm.  An 
example  is  the  snap-lo  grid  toggle,  which  restricts  cursor 
movement  to  a  3D  grid  when  it  is  on. 

Exploring  tlie  model  provides  understanding  of  its 
3D  shape,  so  3dm  supports  multiple  methods  of  navigating 
in  the  modeling  space.  The  HMD  system  used  for  3dm  allows 
the  user  to  walk  through  the  model  space  a  few  paces  in  any 
direction.  Walking  simply  docs  not  provide  the  range  of 
novement  needed  for  most  models,  so  3dm  supports 
■flying,"  a  commonly  used  method  of  traveling  through 
virtual  worlds.(2]  Flying  consists  of  translating  the  user 
through  model  space  in  the  direction  that  the  cursor  is 
pointing.  Flying  moves  the  magic  carpet,  which  canies  the 
user  and  the  toolbox  along.  A  method  of  navigation  that  is 
the  complement  of  flying  is  “grabbing"  the  world.  Grabbing 
the  world  allows  the  user  to  attach  the  modeling  space  to  the 
cursor  and  then  drag  and  rotate  it.  Grabbing  can  be  used  to 
bring  a  feature  of  the  world  to  the  user  rather  than  forcing  the 
user  to  walk  or  fly  to  the  feature. 

Models  often  require  manipulation  at  vastly 
different  scales.  To  facilitate  this  type  of  work,  the  user  can 
be  scaled  using  a  process  called  growing  and  shrinking.  This 
scaling  does  not  affect  the  model;  it  changes  the  user's 
relative  size  with  respect  to  the  model.  The  user  could  shrink 
down  to  bird  size  in  order  to  add  eyelashes  to  a  model  of  an 
elephant  and  then  grow  to  the  size  of  a  house  to  alter  the 
same  model's  legs.  Since  the  user  can  become  disoriented  by 
all  of  tliese  methods  of  movement,  there  is  a  command  that 
i.-nmediately  returns  the  user  to  the  initial  viewpoint  in  the 
middle  of  die  modeling  space. 

The  user  receives  continuous  feedback  in  a  variety 
of  ways.  The  HMD  system  provides  all  visual  input  to  the 
user,  so  the  display  must  be  updated  between  IS  and  30  times 
per  second.  Even  during  file  loading  and  other  slow 
operations,  the  screen  is  updated  and  the  head  is  tracked. 
Rubber  banding  is  implemented  in  many  situations;  when 
defining  a  new  triangle,  scaling  or  moving  an  object,  and 
extruding.  Predictive  highlighting  shows  the  user  what 
would  be  selected  if  a  mouse  button  were  pressed.  This 
highlighting  is  used  in  the  toolbox,  and  even  more 
importantly,  when  marking  vertices.  Whenever  the  cursor  is 
near  a  model,  the  nearest  vertex  is  highlighted,  giving  the 
user  an  indication  of  which  vertex  would  be  operated  on 
before  actually  attempting  the  operation. 


Figure  2.  A  triangle  being  added  to  a  model.  Demonstrates 
rubber  banding  and  snapping  to  nearby  vertices. 
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3.2  Tools  and  Commands 

Although  many  tools  are  available  in  the  3dm 
toolbox,  it  is  more  useful  to  understand  the  general  classes  of 
tools  supplied  to  the  user  than  to  enumerate  all  of  the  specific 
tools.  Most  of  these  tools  were  chosen  because  of  their 
proven  utility  in  pre-existing  modelers. 

3.2.1  Surface  Creation 

Surface  creation  is  the  central  purpose  of  most  3D 
modeling,  so  3dm  provides  more  than  one  method  for 
creating  surfaces.  A  triangle  creation  tool  exists  for 
generating  both  single  triangles  and  triangle  strips.  The 
corners  of  these  triangles  are  specified  by  pointing  and 
clicking  the  mouse,  so  the  triangles  are  created  in  their 
desired  locations  rather  than  appearing  in  a  "building”  area 
and  then  being  moved  into  the  model  space.  Pre-existing 
vertices  may  be  used  during  triangle  creation  to  allow 
triangles  to  share  comers  or  entire  edges,  making  seamless 
connections  easy. 

The  extrusion  tool  supplies  a  more  poi^erful  and 
more  specialized  method  of  triangle  creation.  This  tool 
allows  the  user  to  either  draw  a  poly-line  or  select  one  from 
edges  already  in  the  model  and  stretch  it  out  into  an  extruded 
surface.  The  extrusion  is  perfomied  by  dragging  the  leading 
edge  of  the  surface  with  the  mouse.  Because  the  mouse  can  be 
twisted  and  translated  arbitrarily  during  the  extrusion,  it 
becomes  easy  to  create  complex  surfaces  with  this  tool.  In 
addition,  the  leading  edge  of  this  new  surface  can  be  scaled 
and  then  extruded  again  as  many  times  as  necessary.  This 
form  of  extrusion  can  rapidly  create  such  objects  as  walls, 
legs,  tree  trunks,  and  leaves. 

The  last  surface  creation  tools  facilitate  creation  of 
standard  surface  shapes.  Currently  box,  sphere,  and  cylinder 
tools  exist.  They  each  allow  the  user  to  interactively  stretch 
out  an  arbitrarily  proportioned  wireframe  representation  of  a 
standard  shape.  When  the  wireframe  representation  has  the 
desired  proportions,  it  is  turned  into  a  triangulated  surface. 

3.2.2  Editing 

Since  surfaces  are  rarely  in  exactly  the  desired  shape 
upon  creation,  it  is  important  that  surface  editing  be  an  easy 
operation.  The  most  commonly  used  editing  tool  is  the 
mark/move  tool.  This  tool  provides  a  method  of  grasping 
and  moving  arbitrary  portions  of  the  model.  Not  only  can 
entire  objects  be  grabbed  and  moved  with  the  mouse,  but 
selected  groups  of  vertices  can  be  moved  in  order  to  distort 
part  of  an  object.  Scaling  can  also  be  performed  on  cither 
entire  objects  or  groups  of  vertices.  During  both  movement 
and  scaling,  the  user  sees  the  model  changing  in  real  time 
This  interaction  decreases  the  number  of  edits  needed  to  make 
a  desired  change.  The  marking  aspects  of  this  tool  are  used  to 
mark  arbitrary  portions  of  the  model  for  operations  with 
other  tools. 

Familiar  editing  operations  from  drawing  programs 
are  a  group  of  3dm  commands  that  facilitate  rapid 
experimental  changes.  An  arbitrary  number  of  triangles  or 
entire  objects  can  be  cut,  copied,  pasted  and  deleted.  These 
commands  provide  easy  reuse  of  existing  objects 

An  undo/redo  stack  is  provided  fur  re  ersing  any 
number  of  operations  from  any  tool  or  com  .and.  As 
operations  are  performed,  the  changes  they  c  i.«j  to  the 


model  are  stored  in  the  undo/redo  stack.  The  undo  command 
can  then  be  used  to  pop  changes  off  of  this  stack  to  undo  as 
many  operations  as  necessary.  These  undo  operations  can 
themselves  be  undone  with  the  redo  command.  The  undo/redo 
commands  encourage  experimental  changes  to  the  model 
because  no  operation  can  cause  permanent  damage. 

3.2.3  Hierarchy 

The  hierarchical  features  of  3dm  provide  methods 
for  organizing  complex  models.  "Grouping"  can  be  used  to 
associate  triangles  and  possibly  other  groups  to  more  easily 
manipulate  them  as  a  whole.  These  groups  can  be  instanced. 
An  instance  is  similar  to  a  copy  of  a  group  that  can  be 
arbitrarily  translated,  rotated,  and  scaled.  However,  the 
difference  between  an  instance  and  a  copy  is  that  the 
instances  of  a  group  are  all  linked  to  the  same  basic  shape.  If 
this  shape  is  changed,  then  the  change  is  reflected  in  all 
instances  at  once.  An  example  where  instancing  would  be 
useful  is  in  a  model  of  a  large  building.  Suppose  that 
hundreds  of  chairs  were  in  this  building.  If  one  model  of  a 
chair  were  instanced  many  times  to  make  these  chairs,  then  a 
change  to  a  single  chair  would  be  reflected  in  hundreds  of 
places  throughout  the  building. 

Groups  can  be  organized  into  a  hierarchy 
represented  by  a  directed  acyclic  graph.  This  type  of 
hierarchy  is  particularly  well-suited  to  modeling  articulated 
figures.  The  ability  to  instance  groups  and  impose  a 
hierarchy  on  them  helps  to  organize  models. 

4  Results 

Actual  modeling  sessions  have  shown  that  3dm  is 
efficient  for  rapidly  prototyping  models.  Organic  shapes, 
like  rocks  and  trees,  have  proven  to  be  particularly  good 
subjects  for  3dm.  These  shapes  are  easily  created  in  3dm 
because  it  provides  a  good  sense  for  spatial  relationships. 
Users  of  3dm  have  commented  that  they  feel  a  sense  of 
control,  because  they  can  reach  out  and  grab  any  part  of  the 
model  with  case.  The  ability  to  make  these  quick 
modifications  encourages  the  user  to  experiment  with  shajies 
until  they  arc  satisfactory.  However,  3dm  has  shown 
weakness  in  the  area  of  constraints  and  models  that 
traditional  CAD  and  drawing  programs  create  well.  For 
instance,  3dm  has  no  way  of  keeping  two  polygons  parallel, 
causing  some  models  to  appc.ir  irregular. 

The  extrusion  tool  is  an  example  of  a  traditional 
modeling  tool  that  has  become  even  more  powerful  bceause 
of  its  use  in  a  HMD  framework.  In  most  modeling  systems, 
extrusion  is  performed  by  moving  one  or  two  spatial 
parameters  at  a  time.  3dm  users  often  alter  many  parameters 
at  once  during  an  extrusion  by  twisting  and  translating  the 
new  surface.  Extrusion  in  3dm  often  consists  of  many  short 
extrusions.  In  between  these  short  operations  the  leading 
edge  of  tltc  extruded  surface  is  often  scaled  and  twisted.  The 
result  is  that  complex  surfaces  can  be  rapidly  created  with  an 
easy  to  use  tool. 

Some  initial  solutions  to  3dm's  lack  of  constraints 
have  been  to  add  toggles  m  die  toolbox  for  a  snap-lo  grid  and 
a  snap-to  plane.  The  snap-lo  grid  constrains  the  position  of 
die  cursor  to  the  nodes  of  a  regular  3D  grid.  The  resolution  of 
die  snap-to  grid  is  dynamically  modified  to  be  appropriate  to 
the  user's  current  "grown"  or  "shrunk"  size.  The  snap-to 
plane  gives  the  ability  to  constrain  cursor  movement  to  2 
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dimensions.  The  snap-to  constraints  help  in  making  regular 
objects,  such  as  mechanical  parts. 

5  Conclusion 

3dm  draws  techniques  of  model  manipulation  from 
both  CAD  and  drawing  programs  and  applies  them  to 
modeling  in  an  intuitive  way.  A  HMD  modeling  system  uses 
these  tools  to  simplify  the  problem  of  3D  model 
manipulation  and  understanding. 

3dm  is  a  step  toward  making  3D  modeling 
accessible  to  unsophisticated  users.  It  supports  users'  natural 
forms  of  interaction  with  objects  to  give  them  better 
understanding  of  the  shapes  of  their  models.  Even  a  novice 
user  can  understand  how  to  manipulate  a  model  by  reaching 
out  and  grasping  it.  Users  are  encouraged  to  experiment  with 
model  shape  because  3dm  facilitates  making  rapid  changes. 
The  effects  of  a  change  to  a  model  can  be  clearly  understood 
because  the  user  can  explore  the  model  using  a  variety  of 
intuitive  navigation  techniques. 

Advanced  users  are  also  empowered  by  3dm.  Many 
of  the  tools  borrowed  from  existing  modeling  systems 
become  more  powerful  when  used  with  a  HMD.  One  source  of 
increased  utility  is  the  fact  that  complex  operations  can 
involve  simultaneous  modification  of  many  spatial 
parameters.  Examples  of  tools  that  take  advantage  of  this  arc 
object  placement  and  extrusion,  which  both  allow 
combinations  of  rotation  and  translation  in  a  single  step. 
By  concentrating  more  functionality  into  each  operation, 
fewer  operations  are  needed  to  fxsrform  a  task  and  models  can 
be  created  faster. 
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Abstract 

Recent  advances  in  software  and  hardware  technology  have 
made  direct  ray-traced  volume  rendering  of  3-d  scalar  data  a 
feasible  and  effective  method  for  imaging  of  the  data’s  con¬ 
tents,  The  time  costs  of  these  rendering  techniques  stilt  do 
not  permit  full  interaction  with  the  data,  and  all  of  the  pa¬ 
rameters  effecting  the  resulting  images.  This  paper  presents 
a  set  of  real-time  interaction  techniques  which  have  been  de¬ 
veloped  to  permit  exploration  of  a  volume  data  set.  Within 
the  limitation  of  a  static  viewpoint,  the  user  is  able  to  inter¬ 
actively  alter  the  position  and  shape  of  an  area  of  interest, 
and  modify  local  viewing  parameters,  A  run  length  encoded 
cache  of  volume  rendering  samples  provides  the  means  to 
rerender  the  volume  at  interactive  rates.  The  user  locates 
and  plants  “seeds”  in  areas  of  interest  through  the  use  of 
data  slicing  and  isosurface  techniques.  Image  processing 
techniques  applied  to  volumes  ( i.e.  volume  processing),  can 
then  automatically  form  regions  of  interest  which  in  turn 
modify  the  rendering  parameters.  This  “region  growing” 
of  “seedlings”  incrementally  alters  the  image  in  real-time 
providing  further  visual  cues  concerning  the  contents  of  the 
data.  These  tools  allow  interactive  exploration  of  internal 
structures  in  the  data  which  may  be  obscured  by  other  imag¬ 
ing  algorithms.  Magnetic  Resonance  Angiography  (MR A) 
provides  a  driving  application  for  this  technology.  Results 
Rom  preliminary  studies  of  MRA  datu  are  included. 

1  Introduction 

Three  dimensional  scalar  fields  (or  volumes)  of  data  arise 
in  a  number  of  applications  from  computer  simulation  of 
physical  phenomena  to  data  gathered  for  medical  diagnostic 
use  via  CAT  scans  and  Magnetic  Resonance.  Rendering 
images  directly  from  the  volume  has  been  demonstrated  to 
be  an  effective  method  for  visualizing  such  data  (7,  9,  11, 
12,  14,  19,  27,  29].  Volume  rendering  avoids  many  artifacts 
which  may  arise  when  intermediate  graphics  primitives  are 
required  [7,  13). 

Volume  images  are  constructed  either  in  image  order  by 
sampling  the  volume  along  a  ray  from  an  eye  point  through 
the  data  or  by  projection  of  the  data  directly  onto  the  pixel 
array.  Differences  in  algorithms  also  deal  with  the  order  in 
which  the  samples  are  processed,  either  front  to  back  as  in 
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ray  tracing  [12],  or  back  to  front  analogous  to  a  painter’s 
algorithm.  Ray  tracing  algorithms  process  a  pixel  at  a  time, 
while  in  projection  techniques,  a  singe  sample  or  data  point 
may  effect  an  area  of  pixels  around  the  sample  in  some  way 
via  “splatting”  [28],  or  projecting  of  a  representative  area 
onto  the  screen  [22,  29]. 

As  in  earlier  image  synthesis  techniques,  acceleration 
methods  focus  on  exploiting  coherence  in  the  image  and  in 
the  data,  and/or  by  progressively  refining  the  image  to  pro¬ 
vide  rough  results  early  in  the  rendering  process  [11,  15], 
However,  high  quality  images  still  requite  many  seconds  or 
minutes.  Thus  interactive  exploration  of  volume  data  sets 
via  these  techniques  is  still  not  feasible.  Changes  in  viewing 
parameters,  mappings  of  data  values  to  opacity  or  color,  or 
enhancing  regions  of  interest  requite  complete  new  tender- 
ings. 

The  research  presented  here  exploits  the  coherence  across 
all  possible  images  from  a  given  viewpoint  to  provide  inter¬ 
active  rendering  rates  for  high  quality  images.  The  starting 
point  of  this  algorithm  is  the  volume  ray  casting  technique 
as  presented  by  Levoy  [12,  15].  Earlier  work  in  raytracing 
[21]  has  shown  that  a  view  dependent  cache  can  be  exploited 
to  good  effect  when  surface  properties  and  light  source  in¬ 
tensities  need  to  be  adjusted  while  view  position  and  geom¬ 
etry  remain  unchanged.  In  this  paper  we  apply  a  similar 
idea  to  ray  casting  based  volume  rendering.  The  method 
described  here  caches  rendering  information  at  each  sample 
point  along  each  ray.  The  cache  allows  new  images  based 
on  changes  in  rendering  parameters  to  be  generated  as  the 
changes  are  made,  providing  an  interactive  loop  for  volume 
exploration. 

Local  areas  of  interest  within  the  volume  can  be  indicated 
by  the  user  planting  a  “seed”  in  the  volume.  Local  rendering 
parameters  can  then  be  modified  based  on  location  relative 
to  the  seed.  The  basics  of  interactive  use  of  local  rendering 
modification  through  the  use  of  “volume  seeds”  has  been 
discussed  in  an  earlier  paper  [17],  and  will  be  summarized 
in  the  next  section.  Problems  in  the  earlier  system  included 
excessive  storage  requirements,  and  difficulty  in  placing  and 
forming  regions  of  interest. 

The  paper  continues  with  a  discussion  of  the  use  of  coher¬ 
ence  in  the  sample  caching  process  followed  by  a  description 
of  new  interactive  positioning  tools  utilizing  an  integration 
of  slicing  and  kosurface  techniques.  We  then  describe  the 
use  of  image  processing  techniques  generalized  to  volumes 
(volume  processing)  to  automatically  generate  matte  vol¬ 
umes  modifying  local  rendering  parameters.  In  this  way, 
the  seed  sprouts  into  a  “seedling”  to  enhance  the  render¬ 
ing  in  connected  regions  of  particular  interest.  Rendering 
parameters  such  as  opacity  are  then  based  on  ininiinuiii  dis- 
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tance  from  the  seedling.  Image  processing  methods  have 
been  applied  to  volumes  to  segment  the  volume  into  discrete 
regions  [18,  25,  26],  However,  it  should  be  noted  that,  in 
the  appUcation  described  here,  the  seedling  itself  is  never 
rendered  directly,  but  rather  the  volume  of  data  continues 
to  be  tendered  with  directly,  but  rather  the  volume  of  data 
continues  to  be  tendered  with  volumetric  techniques  with 
modified  parameters  based  on  position  relative  to  the  grown 
region. 

Much  of  the  motivation  for  the  development  summarized 
above  has  come  from  a  medical  application.  In  particular, 
imaging  of  data  arising  from  Magnetic  Resonance  Angiogra¬ 
phy  (MRA),  in  which  the  focus  of  attention  is  on  exploring 
the  vascular  structure  within  the  brain  or  other  regions  of 
the  body.  MRA  techniques  are  used  to  diagnose  m^forma- 
tions  and  aneurisms  within  the  brain's  blood  supply,  and  to 
plan  surgical  and  catheterization  procedures.  The  intricate 
nature  of  the  vascular  structure  as  well  as  the  somewhat 
noisy  data  capture  require  the  ability  to  focus  attention  on 
specific  vessels  as  potential  anomalies  are  discovered.  Re¬ 
sults  of  the  use  of  the  above  algorithms  on  this  application 
will  be  presented  and  discussed. 

2  Volume  Seeds 

Ray  traced  volume  rendering  involves  sampling  the  data  vol¬ 
ume  at  evenly  spaced  points  along  a  ray,  computing  a  lo¬ 
cal  illumination  and  opacity  value  and  composing  the  result 
with  earlier  samples  dong  the  ray.  Individual  sample  con¬ 
tributions  are  computed  from  trilinearly  interpolated  values 
from  surrounding  voxels,  where  each  voxel  contains  a  data 
value  and  an  estimated  gradient  determined  through  finite 
differencing  from  its  neighbors.  The  interpolated  value  and 
gradient  provide  arguments  to  mapping  functions  to  deter¬ 
mine  color  and  opacity  at  the  sample  point.  These  color 
and  opacity  values  may  be  a  derived  from  a  simple  map¬ 
ping  from  value  to  RGB  (or  opacity)  or  be  determined  from 
more  complex  statistical  procedures  intended  to  classify  the 
likelihood  of  a  particular  material  (e.g.,  bone,  muscle)  being 
present  at  a  particular  location  [7]. 

The  final  illumination  contribution  at  each  sample  point 
is  computed  from  the  color,  opacity,  local  normal  (estimated 
from  the  gradient  of  the  data),  and  direction  vectors  to  the 
eye  and  lights.  These  parameters  to  a  Phong  lighting  model 
produce  the  final  illumination  at  each  sample  point.  Finally 
each  sample  illumination  value  is  composited  with  earlier 
samples  ^ong  the  ray  based  on  the  accumulated  opacity 
along  the  ray.  Sampling  can  stop  when  the  accumulated 
opacity  approaches  unity. 

2.1  Image  Coherence  from  a  Static  Viewpoint 

By  examining  the  volume  rendering  process  described  above, 
the  required  calculations  can  be  broken  into  the  two  cate¬ 
gories,  those  which  are  independent  and  dependent  on  map¬ 
pings  from  position  and  value  to  color  and  opacilg.  Map- 
independent  computations  include: 

•  gradient  calculation  at  voxels, 

•  determination  of  rays,  and  sample  points  along  each 
ray, 

•  trillnear  int.;tpolation  of  data  values  and  gradients  to 
the  sample  points, 

•  the  determination  of  local  shading  parameters,  e.g.  an¬ 
gles  between  view  vector,  light  vector(s),  and  normal. 


•  and  evaluation  of  a  monochrome  local  lighting  model, 
(i.e.  independent  of  color  and  opacity). 

Afop-dependenf  calculations  include  only: 

•  mapping  of  data  and  position  values  to  opacity  and 
color, 

•  final  evaluation  of  the  local  illumination, 

•  and  compositing  sample  value  illumination  for  final 
pixel  color. 

The  above  lists  illustrate  the  fact  that  most  of  the  com¬ 
putation  is  map-independent.  However,  the  remaining  map- 
dependent  calculations  leave  a  wide  discretion  for  modifica¬ 
tion  of  the  final  image.  This  includes  changes  in: 

•  the  mapping  from  value  to  color, 

•  the  mapping  from  value  (and/or  the  length  of  the  gra¬ 
dient)  to  opacity, 

•  and  position  based  variation  in  color  and/or  opacity. 

By  providing  interactive  tools  to  modify  the  local  mapping 
to  color  and  opacity,  a  user  can  create  new  renderings  in  in¬ 
teractive  times  (less  than  one  to  five  seconds  at  480x480  res¬ 
olution  op  an  SGI  240GXT).  The  locality  of  the  mappings  is 
controlled  by  the  interactive  specification  of  matte  volumes 
[7],  By  planting  a  seed  at  a  point  in  the  volume,  opacity 
vrdues  can  be  modified  as  a  function  of  the  distance  from 
the  seed  location.  This  allows  the  user  to  focus  attention  on 
particular  regions  of  interest.  By  adding  a  binary  decision 
indicating  if  the  sample  is  in  front  or  behind  an  imaginary 
plane  through  the  seed,  virtual  cut-aways  can  also  be  pro¬ 
duced  in  the  same  way.  A  final  acceleration  to  the  rendering 
process  can  be  made  by  recognizing  that  only  a  local  region 
of  screen  space  will  be  effected  by  a  new  seed  location  when 
the  matte  volume  is  limited  in  size.  Details  of  the  matte 
volume  functions  and  cut-away  techniques  can  be  found  in 
Ma  et  al  (17). 

2.2  Sample  Data  Caching 

The  ability  to  quickly  modify  the  image  based  on  new  matte 
volumes  and  the  related  mappings  depends  on  caching  the 
map-independent  information  at  each  sample  point.  This 
includes  the  partially  computed  illumination  value  and  tri- 
lineatly  interpolated  data  value  for  lookup  into  the  interac¬ 
tively  modified  mappings.  Unlike  standartl  volume  render¬ 
ing,  the  storage  of  samples  along  a  ray  cannot  stop  when 
opacity  reaches  unity  since  opacity  values  can  be  changes 
interactively.  The  current  implementation  store.s  a  two  byte 
illumination  value,  and  one  byte  data  value  per  sample.  Al- 
thougli  the  three  bytes  per  sample  is  compact,  this  may 
require  substantial  memory,  on  the  order  of  150  Mb  for  a 
500x500  image  with  200  samples  per  ray.  This  problem  can 
be  largely  ameliorated  for  most  data  sets  by  run-length  en¬ 
coding  the  sample  values  along  each  ray.  In  particular,  if 
some  range  of  values,  e.g.  zeros  indicating  empty  space,  can 
be  a  priori  ruled  as  transparent,  then  both  the  storage  and 
subsequent  rerenderings  can  often  be  reduced  by  one  to  two 
orders  of  magnitude.  The  run  length  encoding  is  accom¬ 
plished  stealing  a  bit  from  the  1C  bit  Illumination  \alue. 
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3  Seed  Positioning 

The  heed  for  the  user  to  locate  and  position  seeds  to  indicate 
Mcas  of  interest  requires  the  ability  to  easily  move  and  posi- 
iioh  a  cursor  in  the  three  dimensions  of  the  volume.  Visual 
feedback  for  this  process  should  provide  clues  both  about  the 
cursor’s  position  and  some  indication  of  the  volume’s  content 
to  ^ow  a  seed  to  be  placed  near  a  region  with  a  suspected 
anomaly,  A  multi-modal  approach  has  been  taken  to  serve 
these  needs.  Operations  which  can  be  performed  smoothly 
in  real  time  include  manipulation  of  a  rough  3D  isosurface 
model  and  display  of  data  on  a  slice  through  the  volume. 

Isosurface  and  slice  display  provide  the  basis  for  the  user 
interface  which  has  been  developed.  A  low  resolution  isosur¬ 
face  is  computed  from  a  downsized  data  set  by  a  polygoniza- 
tion  algorithm  [3,  16].  This  provides  enough  detail  to  give 
the  user  a  correspondence  between  the  data  set  and  what 
can  be  seen  in  the  volume  rendered  image.  A  slice  through 
the  data  volume  orthogonal  to  the  view  direction  indicates 
the  depth  position.  A  “screen  door"  transparency  render¬ 
ing  of  the  slice  permits  continued  view  of  the  portion  of  the 
isosurface  behind  the  slice.  Finally,  the  coloration  of  the 
isosurface  is  based  on  distance  from  the  slice  plane,  white 
away  from  the  plane,  ;ind  red  where  the  plane  slices  the  iso¬ 
surface.  Color  plate  1  shows  the  full  screen  presented  to  the 
user  in  the  Volume  Seedlings  system.  The  slicer/isosurface 
interface  is  in  the  upper  left.  Seeds  can  be  deposited  on  the 
slicing  plane  which  will  then  effect  the  subsequent  rendering 
of  the  volume  in  the  upper  right. 

Thus,  the  user  can,  in  real-time,  manipulate  both  the  ro¬ 
tation  of  the  isosurface  and  position  of  the  slice  plane.  By 
pointing  to  some  point  on  the  resulting  image,  a  seed  is 
placed  at  the  depth  of  the  slice  plane  and  in  the  location 
of  the  cursor.  The  volume  rendering  can  then  be  modified 
based  on  the  new  seed  location. 

4  Volume  Seedlings 

A  single  seed  highlights  a  spherical  region  around  the  seed 
point.  In  many  applications,  however,  the  shape  of  the  re¬ 
gion  of  interest  within  the  volume  is  not  strictly  spherical, 
but  rather  is  data  dependent.  The  idea  of  Volume  Seedlings 
is  to  use  the  seed  point  as  a  base  from  which  to  sprout  a 
seedling  along  paths  of  "maximum  interest”,  thus  highlight¬ 
ing  the  region  of  interest. 

Identifying  regions  of  interest  within  the  volume  is  closely 
related  to  the  computer  vision  problem  of  identifying  regions 
of  interest  within  an  image.  Hence  the  seedling  growth  al¬ 
gorithm  is  similar  to  region  growing  algorithms  described  in 
the  computer  vision  literature  [1]  and  2d  seed  fill  algorithms 
described  in  the  computer  graphics  literature  [8,  24}.  One 
important  difference  is  a  primary  interest  in  the  intermedi¬ 
ate  states  of  the  growth  process.  Computer  vision  region 
growing  algorithms  are  primarily  concerned  with  a  final  seg¬ 
mentation  of  the  image.  A  similar  problem  of  extracting 
closed  regions  in  a  volume  of  data  has  been  addressed  by 
Miller  et  al  [18], 

The  seedling  growth  algorithm  used  in  the  work  presented 
here  is  voxel  based.  A  priority  queue  [20]  of  voxeb  is  main¬ 
tained  determining  voxels  witliin  the  volume  which  need  to 
be  explored.  Initially,  the  priority  queue  contains  only  the 
user  specified  seed.  At  each  growth  step,  the  highest  priority 
voxel  is  extracted  from  the  priority  queue  and  its  26  neigh¬ 
boring  voxels  are  examined.  The  priority  assigned  to  each 
voxel  within  the  queue  is  based  on  the  "degree  of  interest” 
of  that  voxel.  We  have  currently  experimented  with  linear 
combinations  of  three  priority  functions: 


Figure  1:  A  graphical  illustration  of  the  opacity  matte  as  a 
function  of  the  distance  from  the  seedling. 


•  classification  based 

The  priority  of  a  voxel  is  based  on  its  material  classi¬ 
fication.  A  voxel  is  given  a  higher  priority  the  greater 
its  percentage  of  some  user  specified  desired  material. 
This  priority  function  encourages  growth  within  regions 
of  this  material. 

•  gradient  based 

The  priority  of  a  voxel  is  based  on  the  magnitude  of  the 
gradient  at  the  voxel.  High  gradient  values  indicated 
a  surface  boundary  between  materials  so  this  priority 
function  encourages  growth  along  the  surfaces  bound¬ 
aries. 

•  position  based 

The  priority  of  a  voxel  is  based  on  the  distance  from 
the  original  seed  point,  thus  encouraging  growth  of  the 
seedling  near  the  position  indicated  by  the  user. 

Many  other  priority  functions  are  possible. 

In  addition  to  the  continuous  priority  function,  a  discrete 
test  is  used  to  eliminate  many  voxels.  Thus,  the  neighbors 
are  inserted  in  the  queue  only  if: 

•  they  haven’t  been  visited  before 

•  they  pass  an  ”eligibility”  test 

The  eligibility  test  is  not  required  but  can  significantly  re¬ 
duce  the  size  of  the  priority  queue  by  eliminating  obviously 
uninteresting  voxels.  Currently,  our  eligibility  test  is  a  sim¬ 
ple  threshold  on  the  priority,  thus  voxels  are  included  in  the 
queue  only  if  their  priority  indicates  at  least  a  modicum  of 
interest. 

The  seedling  growth  process  yields  a  set  of  voxels,  in  pri¬ 
ority  order,  defiiu.ig  a  region  of  interest  within  the  volume 
The  region  of  interest  is  highlighted  through  the  use  of  an 
opacity  matte  as  before  for  a  single  seed  The  opacity  matte 
volume  is  based  on  a  function  of  the  distance  to  the  closest 
vo.xel  of  the  seedling  as  illustrated  in  Figure  1. 
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Figure  2:  Traditional  volume  rendering  of  the  UNC  Chapel 
Hill  CT  head  data. 


Figure  3:  A  seed  point  is  used  to  highlight  a  spherical  area 
in  the  data  set. 


The  seedling  opacity  matte  is  computed  according  to  the 
following  formula; 

o(p)  =  mn  +  (vix  -  mil)  ■  0(mindist(p,a),r) 

co32(s.d^)  if  (list  <r, 

0  otherwise. 

where  mitidi$t(p,s)  is  the  minimum  distance  between  any 
voxel  of  the  seedling  s  and  the  sample  point  p  in  three-space. 
The  mn,  mx  and  r  parameters  ate  specified  by  tlie  user. 

In  essence,  r  is  used  to  control  how  wide  an  area  the  user 
wants  to  see.  Surfaces  outside  this  area  should  be  semi¬ 
transparent  or  fully  transparent,  determined  by  mn.  Mx  is 
used  to  indicate  how  much  enhancement  is  to  be  made  to 
the  area  near  the  seed.  Note  that  the  opacity  matte  is  never 
stored  explicitly  but  is  instead  computed  on  the  fly  from  the 
distance  to  the  seedling. 

By  adding  one  additional  byte  to  the  sample  cache  to  hold 
the  distance  to  the  nearest  point  in  the  seedling,  images  can 
be  computed  incrementally.  As  each  new  voxel  of  interest 
is  extracted  from  the  priority  queue,  only  rays  representing 
pixels  which  pass  near  the  new  voxel  need  to  be  proce.ssed. 
The  minimum  distance  from  a  sample  point  to  any  point 
on  the  seedling  is  maintained  by  updating  the  distance  only 
when  the  new  point  on  the  seedling  is  closer  than  any  pre¬ 
viously  processed  points  (as  in  a  Z-bufier  algoritlim). 

Interactive  changes  can  be  made  to  the  matte  function 
mn,  mx,  and  r  parameters,  as  well  as  the  color  and  ojiacity 
maps  ba.sed  on  data  value,  without  invalidating  the  cache. 
The  retenderings  ate  thus  sery  rapid  due  to  the  sample  dis¬ 
tance  caching  and  the  fact  that  only  a  small  portion  of  the 
image  space  is  aifected  by  each  new  \o.\cl  added  to  the  region 
of  inteiest.  The  current  implementation  on  an  SGI  210GTX 
ccKtracts  new  seedling  points  and  terenders  the  soluiue  image 
appro.ximate-ly  10  times  per  second.  This  dynamic  nature  of 
the  seedling  grow  th  also  proudes  visual  cues  to  the  user. 

Figures  2,  3,  4  illustrate  the  use  of  seeds  and  seedlings  on 
the  CT  head  set  from  the  UNC  Chapel  Hill  Volume  Data 


Figure  4:'  A  gradient  based  seedling  is  used  to  highlight  a 
structure  in  the  data  set. 


Sets.  Figure  2  is  a  normal  volume  reiidetiug  of  the  data 
set  without  the  use  of  a  seed.  Figure  3  uses  a  seed  point 
to  highlight  a  spherical  region  in  the  neck  area.  Figuie  1 
shows  a  seedling  grown  from  the  same  seed  point  using  a 
gradient  based  seedling  growth  piiority  function.  Figuie  1 
uses  a  smaller  ojiaeity  matte  radius  than  Figure  3  to  focus 
on  the  seedling  itself  rather  than  a  broader  area  around  the 
seadling.  Notice  that  Figure  1  dues  a  much  better  job  of 
isolating  the  region  of  interest. 


Figure  5:  Four  views  of  the  MRA  vascular  data  set. 


5  Magnetic  Resonance  Angiography  The  most  common  visualization  method  used  is  a  simple 

Maximum  Intensity  Projection  (MIP;  in  which,  as  the  name 
Magnetic  Resonance  Angiography  (MRA)  is  used  to  extract  implies,  a  simple  projection  of  the  data  onto  a  pixel  grid  is 

the  vascular  structure  fronr  within  soft  tissues  like  the  brain.  performed  in  which  the  pixel  values  are  given  the  maximum 

Visualizing  the  vascular  structure  can  help  in  diagnosing  value  along  a  corresponding  ray  through  the  volume.  A 

malformations  such  as  anuerisms  and  blockages,  and/or  help  senes  of  such  linages  from  diiTerent  angles  ate  viewed  in  suc- 

prepare  surgical  procedures  such  as  catheterization  through  cession  to  provide  depth  cues.  However,  single  frames  lose 

the  vessels,  or  other  invasive  procedures  designed  to  not  dis-  of  the  depth  information,  and  the  sequence  does 

turb  the  vascular  structure.  The  non-invasive  nature  of  provide  the  full  range  of  geometric  information  visible 

MRA  over  traditional  angiography  makes  this  diagnostic  sophisticated  algorithms. 

approach  safer  and  thus  more  widely  applicable.  Unfor-  Xlie  goals  of  the  Volume  Seedlings  approach  is  to  pro- 

tuiiately,  MRA  data  capture  cannot  extract  single  vessels  viJo  three  diniensiunal  visual  cues  captured  by  vulaiae 
as  can  be  done  by  selective  dye  release  from  a  catheter  in  rendering,  while  providing  interactive  tools  to  explore  the 
traditional  angiography.  data  set  and  extract  individual  vessels  for  closer  examina- 

The  vascular  structure  in  MRA  is  captured  by  taking  ad-  tioii.  Other  work  has  been  done  in  this  aiea  to  extract  siii’ 

vantage  of  the  fact  that  blood  flows  within  the  veins  and  gle  vessels  through  connectivity  information  Cline  et  al 

arteries.  The  signal  which  is  received  is  related  to  the  time  selected  a  si'  ..t  voxel  value  and  extracted  all  voxels  of  the 

in  which  individual  molecules  are  within  the  bounds  of  a  same  valu  ..unected  to  a  seed  point,  and  projected  these 

thin  slice  through  the  body.  DifTiculties  arise  due  to  noisy  vo.xels  dii  lly  onto  a  screen  [1,  5],  Other  imaging  tech- 

data  capture,  or  dropouts  due  to  vessels  which  lie  in  the  nivtues  for  MR.\  have  been  described  as  well  [2,  6,  10,  23], 

plane  of  excitation.  however,  not  in  the  context  of  interactive  systems  with  the 
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Figure  6:  Four  interinediate  steps  in  the  growth  of  a  seedling. 


use  of  volume  rendering  as  tlie  Final  imaging  method. 

Figure  5  shows  four  views  of  the  vascular  structure  within 
the  brain  of  a  patient  suffering  from  an  anuerism.  One  single 
large  seed  in  the  center  of  the  volume  is  used  to  capture  most 
of  the  vessels  while  eliminating  the  vessels  at  the  outer  edges 
which  complicate  and  obscure  the  interior.  (These  images 
are  rendered  at  full  IK  x  IK  resolution  as  opposed  to  the 
480x180  resolution  in  interactive  mode.)  After  selection  of 
a  seed  in  the  area  of  the  anuerism  a  seedling  is  grown  to 
extract  the  region  of  interest  (Figure  6).  The  four  images 
show  the  progress  of  the  seedlings  growth  at  four  stages. 

6  Conclusion 

The  ability  to  interactively  isolate  regions  of  interest  within 
a  volume  rendering  context  has  been  discussed  By  growing 
Vclnme  Seedlings  within  the  data  set  according  to  ‘•interest” 
functions,  features  which  may  otherwise  be  hidden  b>  the 
image  complexity  or  by  opaiiue  regions  can  be  examined 
A  description  of  an  interactive  volume  exploration  svstem 
has  been  described.  Finally,  the  use  of  these  technhjues  in 


the  context  of  Magnetic  Resonante  Angiography  to  highlight 
individual  vessels  ha.s  been  demonstrated. 

The  application  of  image  processing  and  computer  vision 
technuiues  to  the  problems  involved  in  scientific  visualira- 
tion  is  an  exciting  area  for  exploration.  It  is  expected  that 
other  more  sophisticated  region  growing  algorithms  will  be 
applicable  in  a  wide  variety  of  applications. 
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Abstract 

The  devdopmem  of  3D  visud  simulation  systems  on  inexpen¬ 
sive,  commercially  available  graphics  workstations  is  occurring  to¬ 
day  and  will  be  commonplace  in  the  near  future.  Such  systems  are 
being  constnicted  to  move  through  and  interact  with  3D  virtual 
worlds.  There  are  a  variety  of  goals  for  these  systems,  including 
trdning,  planning,  gaming  and  other  purposes  where  the  introduc¬ 
tion  of  the  i^tysical  player  may  be  r>>o  hazardous,  too  expensive  or 
too  frivolous  to  be  tolerated.  Wc  present  such  system.  NPS¬ 
NET,  a  workstation-based,  jD  simulator  for  virtual  world  ex¬ 

ploration  and  experimenution. 

Virtual  World  Systems 

The  auention  to  virtual  world  systems  is  particularly  appealing 
to  the  researchers  of  the  Graphics  and  Video  Laboratory  of  the  De¬ 
partment  of  Computer  Science  at  the  Naval  Postgraduate  School  as 
oiff  focus  for  years  has  been  on  the  production  of  prototype  3D  vi¬ 
sual  simulation  systems  on  commercially  available  graphics  work¬ 
stations  [9,18-23].  3D  visual  simulation  systems  have  many  of  the 
characteristics  of  virtual  work)  systems  in  that  their  purpose  has 
long  been  for  visualizing  and  interacting  with  distant,  expensive  or 
hazardous  environments.  If  we  turn  off  some  of  uur  physical  mod¬ 
eling,  we  can  even  simulate  non-existent  3D  environments,  so  we 
feel  quite  comfortable  under  the  virtual  worlds  umbrella. 

We  do  not  study  the  ctmstruction  of  our  3D  visual  simulators  on 
specially-designed  graphics  hardware.  We  instead  assume  that  such 
hardware  is  available  fom  commercial  workstation  manufacturers. 
We  build  3D  visual  simulators  on  inexpensive  graphics  workst^.- 
tions  instead  of  specially-designed  hardware  because  of  our  obsei- 
vation  that  the  performance  numbers  from  the  manufacturers  are  s<;> 
suggestive. 

NPSNET:  Overview 

The  Graplucs  and  Video  Laboratory  has  been  developing  low- 
cost,  three-dimensional  visual  simulation  systems  for  the  last  si>. 
years  on  Silicon  Graphics,  Inc.  IRIS  workstations.  The  visual  sim 
ulaiors  developed  include  the  FOG-M  missile  simulator,  the  VEi  i 
vehicle  simulator,  the  airborne  remotely  operated  device  (AROD> 
the  Moving  Platform  Simulator  series  (MPS-l,  MPS-2  and  MPS 

Permission  to  copy  without  fee  all  or  part  of  this  material  is 
granted  provided  that  the  copies  are  not  made  or  distributed  i'c: 
direct  commercial  advantage,  the  ACM  copyright  notice  and  the 
title  of  the  publication  and  its  date  appear,  and  notice  is  given 
that  copying  is  by  permission  of  the  Association  for  Computing 
Machinery.  To  copy  otherwise,  or  to  repubiish,  requires  a  fee 
and/or  specific  permission. 
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3),  the  High  Resolution  Digital  Terrain  Model  (HRDTM)  system, 
the  Forward  Observer  Simidalor  Trainer  (POST),  the  NPS  Autono¬ 
mous  Underwater  Vehicle  simulator  (NPSAUV),  and  the  Com¬ 
mand  and  Control  Workstation  of  the  Future  system  (CCWF). 

Our  current  visual  simulation  efforts  are  on  the  NPSNET  sys¬ 
tem,  a  workstation-based,  3D  visual  simulator  that  utilizes  SIM- 
NET  databases  and  networking  formats.  The  DARPA-sponsored 
SIMNET  fxroject  had  the  goal  of  developing  a  low-cost  tank  simu¬ 
lator  that  provided  a  '*70%  solution"  to  the  tank-war-gaming  prob¬ 
lem  (17]. 

Unfortunately,  the  SIMNET  system  delivered  has  its  grai^iics 
hardware  and  software  suffering  from  a  rigid  specification  bas^  on 
1983  graphics  technology  and  was  not  desigiied  to  take  advantage 
of  ever  faster  and  more  capable  graphics  hardware  and  processor 
power.  Low-cost  for  the  project  meant  $2S0K  per  station.  Instead, 
the  contractor  designed  its  own  graphics  platform,  its  own  proceu- 
ing  system,  and  wrote  software  that  worked  only  on  that  platform. 
In  NPSNET,  we  want  to  be  somewhat  more  flexible  BUT  still  in¬ 
teract  with  the  DARPA  investment. 

The  NPSNET  system  is  an  attempt  to  explore  the  SIMNET  do¬ 
main  using  a  readily  available  graphics  workstation,  the  Silicon 
Graphics,  Inc.  IRIS  worksution  in  all  its  incarnations  (Personal 
IRIS,  GT,  GTX,  VGX...),  instead  of  the  contractor  produced  hard¬ 
ware.  Our  staning  point  is  that  we  assume  databases  and  network 
packet  formats  in  a  form  similar  to  those  utilized  by  the  actual  SIM¬ 
NET  system  but  allow  the  flexibility  for  continuinp  '‘volutions  in 
efficiency. 

NPSNET  is  a  real-time,  3D  visual  simulation  system  capable  of 
displaying  vehie'  movement  over  the  ground  or  in  the  air.  Displays 
show  on-ground  cultural  features  such  as  roads,  buildings,  soil 
types  and  eievations.  The  user  can  select  any  one  of  SOO  active  ve¬ 
hicles  via  mouse  selection  and  control  it  with  a  six  degree  of  fiee- 
dom  spaccball  or  bution/dialbox.  In  between  updating  events,  all 
vehicles  are  dead  reckoned  to  determine  their  current  positions. 
Speed  in  three  dimensions  and  the  location  of  the  vehicle  can  accu¬ 
rately  be  predicted  as  long  as  the  speed  or  direction  of  the  vehicle 
does  not  change.  Vehicles  can  be  controlled  by  a  prewritten  script, 
or  can  be  driven  interactively  from  other  workstations,  as  the  sys¬ 
tem  is  networked  via  Ethernet.  Additionally,  autonomous  players 
can  be  introduced  into  the  system  via  a  programmable  network 
"harness"  pocess  (NPSNET-HARNESS). 

As  obvious  from  the  above  overview,  NPSNET  is  in  many  ways 
a  departure  from  the  goals  of  SIMNET.  We  can  “push  the  enve¬ 
lope”  of  real-time,  workstation-based  virtual  reality  while  provid¬ 
ing  a  workstation-based  SIMNET  ;u>de.  We  present  our  plan  for  the 
overall  NPSNET  effort  in  the  following  sections  to  provide  an  un¬ 
derstanding  of  what  is  required  to  construct  such  a  system. 
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SIMNET  Database  Display  Work 

The  first  effort  in  any  virtual  world  development  is  obtaining 
the  data  that  represents  the  world  to  be  modeled.  For  3D  visual  sim* 
ulations,  this  usually  begins  with  a  large  2D  grid  of  elevation  data 
that  is  turned  into  a  3D  terrain  carpet 

Once  the  terrain  carpet  has  been  extracted  and  displayed,  atten¬ 
tion  then  turns  to  on-ground  cultural  feanires  and  3D  vehicle  icons. 
On-ground  cultural  features  include  roads,  forest  canopies,  trees, 
building,  corrals  and  other  stationary  objects.  Many  cultural  fea¬ 
tures  are  provided  in  2D  and  have  to  be  projected  onto  the  tenain. 
Significant  work  must  be  done  to  accomplish  this.  There  is  the  pre¬ 
processing  work  to  turn  2D  linear  features  like  roads  into  3D,  cor¬ 
rectly  projected  onto  the  terrain  carpet.  Projecting  planar  3D  road 
segments  onto  the  terrain  carpet  is  also  not  easy.  The  problem  is  that 
it  requires  projecting  the  road  polygons  onto  the  same  place  as  the 
tenain  carpet.  Unda  z-buffering,  die  standard  hidden  surface  elim¬ 
ination  method  for  graphics  workstations,  coplanar,  coincident 
polygons  cause  what  is  Imown  as  z-buffer  tearing  (1  ].  We  see  scan 
lines  alternately  colored  with  the  underlying  tenain  color  and  the 
road  color.  We  solve  this  by  drawing  the  underlying  tenain  polygon 
first  into  the  RGB  planes  with  z-buffainj  on  but  modifications  to 
the  z-buffer  off.  We  then  draw  the  road  overlay.  Modifications  to 
the  RGB  planes  are  then  nimed  off  and  the  underlying  tenain  poly¬ 
gon  is  again  drawn,  this  time  with  modifications  to  the  z-buffer  on. 
This  procedure  must  be  done  for  all  coplanar  features  in  the  system. 
It  requires  that  underlying  layers  be  drawn  multiple  times  and  in  an 
ordered  fashion.  The  visual  simulator  must  handle  this  in  a  general 
fashion.  It  is  just  part  of  the  complexity  of  building  such  systems. 

3D  vehicle  icons  are  the  next  consideration  in  constructing  our 
virtual  world  system.  We  call  them  3D  icons  in  that  the  goal  is  not 
realism  but  rather  low  resolution  indicators  of  players  on  the  tenain. 
Low  resolution  means  whatever  level  of  deuil  the  user  of  the  final 
system  is  willing  to  live  with. 

Hierarchical  Data  Structures 
for  Real-Time  Display  Generation 

If  the  modeled  world  is  simple,  just  blasting  all  the  polygons 
through  the  graphics  pipeline  ought  to  get  satisfactory  display  re- 
sulu.  Since  NPSNET  uses  data  from  the  SIMNET  Database  Inter¬ 
change  Specification  (SDIS)  for  an  actual  SOkm  x  SOkm  tenain  area 
of  Fort  Hunter-Liggett,  California  and  has  a  resolution  of  one  data 
point  for  every  12S  meters  [6],  this  will  not  do. 

Hierarchical  data  structures  are  the  hean  of  any  complex  real¬ 
time,  3D  visual  simulator.  Such  data  structures,  in  conjunction  with 
viewing  information,  provide  for  the  rapid  culling  of  polygons  com¬ 
prising  the  tenun  carpet,  the  cultural  feanires,  the  3D  icons  and  any 
other  displayable  objects.  The  purpose  of  this  operation  is  to  mini¬ 
mize  or  reduce  the  flow  of  polygons  through  the  graphics  pipeline 
of  the  workstation's  hardware.  A  classic  reference  to  understand 
this  problem  in  more  detail  is  [2].  The  culling  operation  is  per- 
fonned  through  the  traversal  of  a  data  structure  that  spatially  parti¬ 
tions  the  displayable  data.  The  appropriate  hierarchical  data  struc- 
nire  to  use  is  problem  domain  dependent.  As  we  have  adopted  NPS¬ 
NET  to  additional  tasks,  we  have  had  to  modify  and  change  our  data 
struenire. 

Expanding  the  Terrain  Area 

In  order  to  increase  performance,  the  initial  NPSNET  dataset 
has  been  divided  into  2500  text  files  based  on  the  one  kilometer 
standard  of  the  military  "grid  square"  with  each  file  containing  data 
for  one  square  kilometer.  These  were  preprocessed  into  binary  for¬ 
mat  and  three  additional  lower  resolutions  generated  (250, 5(10  and 
1000  meter),  together  with  fill  polygons  for  each  level.  The  final 


form  of  the  dataset  is  2500  binary  files,  each  containing  a  multiple- 
resolution  (4  level)  description  of  the  terrain  for  one  square  km, 
stored  as  a  heap-sorted  quadtree  [7,12]. 

The  final  format  for  the  binary  terrain  data  files  is  designed  for 
fast  access  using  the  C  function  fread().  All  polygon  descriptions 
arc  stored  in  memory-image  format,  therefore,  no  dam  conversion 
has  to  be  done  during  paging.  The  2500  files  resulting  from  prepro¬ 
cessing  contain: 

Q  Count  of  polygons  in  each  node  of  full  four  level  quadtree  (85 
total). 

Q  Total  polygon  descriptions  in  the  file. 

Q  Multi-resolution  description  of  tenain  in  this  square  kilometer 
stored  in  quadtree  heap-sort  order,  lower  resolutions  first  (Fig- 


As  the  final  dataset  is  loo  large  to  store  in  main  memory  at  one 
time,  and  we  do  not  wish  to  limit  the  simulation  to  some  smaller 
area,  paging  tenain  data  through  a  dynamic  algorithm  is  required. 

A 1 6km  X 1 6km  active  area  was  chosen  based  on  considerations 
for  memory  size  of  available  workstations,  frame  rates,  required 
field  of  view  and  desired  range  of  views.  This  amount  of  tenain  data 
is  in  main  memory  at  any  given  time  and  available  for  rendering. 
Sixteen  kilometers  allows  a  seven  kilometer  field  of  view  in  all  di¬ 
rections  for  immediate  rendering  with  one  kilometer  acting  as  a 
buffer  to  ensure  tenain  is  fully  paged  in  before  attempting  to  render 
it  (Figure  2). 
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;j  Oli  miilti-procettor  woiksutioiu.  the  ^ulator  does  not  wut 
for  i^tional  terrain  to  be  paged  in.  butead,  the  additional  CPUs 
to  page  in  the  terrion  in  parallel. 

terrain  Paging  Algorithm 

Vi^  simtdator  is  iiUtialized,  the  driven  vehicle  is  centered 
on  |  :16  x  . 16  active  area.  The  indices  of  the  center  one  kilometer 
sqoan  containing  the  driven  vehicle  become  the  notional  center. 

is  16)^  into  the  appropriate  elemenu  of  a  SO  X  SO  amy,  and 
hbdwiding  ^  is  establitlMd  around  the  driven  vehicle  (centered 
oil  tiie  in^  of  the  center  square).  When  the  driven  vehicle  reaches 
the  bqwidihg  box  in  ai^  diiwtion,  memory  space  is  freed  in  the  di* 
lectian  oppmlte  of  travel,  terrain  is  paged  in  the  direction  of  travel, 
and  the  bouiiding  box  moves.The  size  of  the  bounding  box  csn  be 
adjusted  as  feqt^  by  vehicle  speedAum  rate  characteristics.  Ter- 
r^  i^hig  is  hidependent  of  the  hierarchical  data  stmcture  imple- 
^t^ 


Terrain  Rendering 

Terrain  renderoig  involves  sevwal  steps  in  NPSNET: 

Q  Detomine  1000m  x  1000m  squares  are  actually  in  the 
field  of  view. 

Q  Determine  resolution  within  eadi  1000m  x  1000m  square 
(there  may  be  at  most  two  resolutions),  including  which  fill 
polygons  are  needed. 

QReiider  the  terrain. 

Two  algorithms  are  Uvolved.  One  checks  to  see  if  a  polygem  is 
within  die  field  of  view  by  calling  a  procedure  that  che^  for  the 
intersection  of  a  point  (each  poim  of  the  polygon)  and  a  polygm 
(the  triangle  composing  the  field  of  view)  (Sl.lhe  other  determines 
the  resolution,  essentially  which  nodes  of  die  quadtree  to  render,  by 
checUng  die  intersection  of  nodes  with  concentric  circles  corre- 
qxmding  to  ranges  of  the  resolutions  (13).  The  circle-rectangle  and 
point-polygon  mtersection  algorithms  are  applied  repetitively  to 
tender  oi^  terrun  within  the  field  of  view  arid  at  the  appropriate 
resolution  levels.  Figure  3  depicts  multi-resolution  within  the  field 
of  view.  NPSNET  is  gnqiliics  bound.  Therefore,  the  computational 
expense  of  the  above  algorithms  is  better  than  rendering  terrain  not 
actually  in  die  field  of  view. 


16x16  Active  Area  Field  of  View 


Resolution 
B  1000m 


Figure  3  •  Multi-Resolution  Rendering 


Implementation  of  the  above  has  resulted  in  a  doubling  of  the 
performance  of  the  simulation  over  high  resolution  rendering  alone. 
However,  performance  when  large  numbers  of  objects  (trees,  vehi¬ 
cles)  are  present  in  the  viewing  area  does  not  change. 

NPSOFF;  Overview 

The  development  of  interesting  virtual  world  systems  requires 
the  modeling  of  many  different  graphical  objects.  How  these  ob- 
jecu  are  represented  the  system  pliys  a  major  part  in  determining 
the  cqiabilities  and  criicien^  of  the  system.  We  use  a  s'^nple,  flex¬ 
ible  object  description  language  to  m^  grqihical  anc'  some  non- 
grsfdti^  aspeett  of  our  objem  called  NP^FF. 

NPSOFF  is  a  language  system  that  consists  of  “tokeru  "  that  rep¬ 
resent  graphical  concepts,  lliese  tokens  are  combined  in  an  ASCII 
file  to  represent  an  ol;^.  The  object  can  then  be  referenced  by  an 
application  in  an  abstract  manner.  The  qiplication  does  not  ne^  to 
Imw  the  details  of  how  the  object  is  composed.  The  level  of  ab¬ 
straction  that  NPSOFF  provides  offers  numerous  advantages  that 
are  discussed  below.  NKOFF  objects  can  have  varying  levels  of 
comidexily  to  represent  a  wide  range  of  graphical  ob^ts  and  envi¬ 
ronments.  NPSOFF  alto  serves  at  a  standard  for  application  devel¬ 
opment.  This  makes  general  purpose  tools  plausible  and  extremely 
useful. 


Functional  Description 

The  NPSOFF  language  can  be  bntinn  down  into  tokens.  In  ear¬ 
ly  versions  of  the  language,  the  tokens  correqioiided  abnost  one  fw 
one  to  GL  functions.  Later  vertient  have  adtied  more  abstraction 
and  flexibility.  The  language  tokens  tinqilify  the  interface  to  the  OL 
library  by  Idling  oomponentt  and  he^  encapsulate  tome  of  iu 
complexily. 

NPSOFF  extends  the  GL  interface  by  allowing  many  system 
teuings  to  be  named.  Naming  system  definitions  allowt  us  to  build 
libraries  of  commonly  used  settings  like  materialt  and  textures. 

NPSOFF  tokens  generally  belong  to  one  of  three  categories; 
definition,  diqilay  or  characieritticsAomposition.  The  definition 
tokens  define  graphics  system  settings.  Definition  tokens  define 
lighu  (normal  and  q»t),  ligitting  mo^lt  (normal  and  two-sided), 
materialt,  textures,  and  colors.  Definition  tokens  are  named  and 
stored  in  tables  for  later  access. 

Display  fxr  execution  tokens  make  up  the  bulk  of  NPSOFF.  Dis- 
jtiay  tokens  represent  a  diange  in  grqihics  system  state  or  graphics 
primitives.  They  are  stored  in  a  sequential  diqilay  list  in  the  order 
ihat  they  appear  in  an  NPSOFF  file.  Example  ti^ens  that  change 
the  system  sute  are:  ittmaUrM,  sHlight,  stHexiun,  etc.  Each  of 
these  tokens  has  a  name  argument  that  correqxmds  with  an  earlier 
definition.  In  the  case  ofutUght,  the  named  light  is  associated  with 
one  of  seven  possiNe  light  numbers  [14],  These  sute  tokens  make 
it  easy  to  manipulate  the  gra{4ucs  pipeline.  Complex  lifting  and 
shading  effects  can  be  done  with  NPSOFF  in  a  simple  aito  straight¬ 
forward  way. 

The  grq^iics  primitives  used  in  NPSOFF  are:  polygons,  surface 
(polygon  with  vertex  normals),  triangular  mesh  ai^  lines.  Addition- 
si  display  tokens  perform  manipulations  of  the  system  matrix  stack. 
NPSOFF  objecu  and  components  of  objects  can  be  transfixmed 
within  the  object  definition  file.  The  tokens  loadmatrix,  Kuikma- 
trix,  pushmatrix,  popmatrix,  rotate,  scale  and  traaskte  define 
stack  manipulations. 

The  third  category  of  NPSOFF  tokens  allow  the  user  to  define 
object  characteristics  and  composition.  This  aliows  a  high  level  of 
abstraction  and  supports  complex  grqrhics  techniques.  Two  of  the 
main  abstractions  are  composite  objects  and  polygon  decaling.  NP- 
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.Sp^  olijectt.ctn  be  named  and  conudn  nested  object  definitions. 
^  mm  t^fihltions  can  contain  any  display  tokens.  TMs  stnic- 
ti»e  aUpm  mtdiiide  relate  objects  to  be  treated  as  a  single  object 
fW'l^^atli^ay.  It  also  minimizes  the  dufAicadon  of  primitive 
di^ti^lObjwts  are  defined  with  the  token  and  dis¬ 

play^ eMoIJiet  token.  This  stnicture  is  flexible  and  use- 
:  fip  fw  lwiiding  complex  objecu  from  simpler  sub^objects. 

'  ‘UsInl’inT^OFF  objwt  b  alio  simple.  Essentially  the  user 
needs  to  tm  c^y  tlvee  function  calls  to  access  and  display  an  ob- 
Jml  ire  mai^  more  programmatic  entrypoints  to  NKOI7 
Intt  niahy  of  thnn  deal  widi  in-monory  man^ation  that  is  not 
MtMfm  stiu)^  use.  They  are  used  primarily  by  tools  that  build 
OT  mmi|ulate  NPSOFF  objectt. 

Physical  Modeling  Support 

in  simulations  developed  in  the  Graphics  and  Video 

Li^ii^^haye  each  handled  physically-based  modelling  (PBM) 
ihd^iejipidy.and  internally.  Tlw  latest  extenskm  of  the  NKOFF 
system  is  m  objwt-orioited  TOM  system  [8].  These  enhancemenu 
give  NKOI7  objects  physical  characteristics  and  provide  mecha¬ 
nisms  lb  control  an  object's  modon  given  a  list  of  internal  and  ex¬ 
ternal  fones  on  the  ot^t  Objects  are  handled  in  an  enclosed  ref¬ 
erent  called  the  “environment".  All  objects  that  pardcipate  in  the 
MPSCFF  PBM  system  are  members  of  die  environmenL 

The  NKOFF  PBM  system  models  object  rigid-body  dynamics 
using  a  Newtonian  framework.  An  object  can  be  given  many  phys¬ 
ical  pn^eities  using  the  dr^ystes  tokoi.  These  properties  include 
the  otiject’s  iiudal  loeation  and  location  constrainu  in  the  environ¬ 
ment,  iidlial  wientation  and  orientation  constrainu,  initial  linear 
and  angular  velocidea  and  constraints  on  eth,  the  object's  mau 
and  center  of  mass,  the  object's  ability  to  absorb  forces  (elasdcity), 
the  dimensioru  of  a  bounding  volimte  and  a  local  viewpoint  for  die 
ol^.  Each  object  can  also  use  its  own  system  of  measurement. 
Thedi(pMte  token  allows  the  user  to  specify  the  unite  of  measure¬ 
ment  for  dimensions,  force  msgiutude  and  mau.  This  capability 
was  incorporated  to  accommodate  the  um  of  object  models  &om 
various  sources.  The  PilM  system  uses  reasonable  constant  or  cal¬ 
culated  defaults  for  all  physical  characteristics  so  none  of  the  prop¬ 
erties  is  required  to  be  present  when  object  physical  characterisdcs 
aredefuied. 

Fences  are  defined  and  added  to  an  object's  force  list  with  the 
token.  Two  ^pcs  of  forces  are  supported:  deforming  and 
non-deforming.  Deforming  forces  are  used  for  object  explosions 
and  bending.  Non-deforming  forces  are  used  to  alter  an  objects  lin¬ 
ear  and  angular  velocities.  Forces  can  be  specified  u  awake  w 
asleep.  This  allows  the  selective  aj^ication  of  previously  defmed 
forces.  The  duracteristics  of  a  force  defmed  'n'vitidtffont  are;  type 
(deforming/bon-defoiming).  origin  relative  to  object  center  and  or¬ 
igin  constraints,  force  dirKtiim  vector,  magiutude  and  magnitude 
constrainu  and  force  state  (asleq^Vawake). 

The  run-time  interface  of  the  NPSOFF  PBM  system  is  simple 
and  flexible.  Once  the  PBM  environment  has  been  initialized,  the 
user  can  add  or  delete  objects  from  the  environment,  add  and  mod¬ 
ify  global  forces,  modify  object  physical  characteristics,  add  and 
mo^  object  force  characteristics  and  modify  object  and  force 
states.  The  environment  is  processed  once  each  display  cycle.  The 
processing  involves  resolving  forces,  calculating  object  states  and 
displaying  the  objects.  The  NPSOFF  PBM  system  provides  us  with 
a  sinqile  environment  to  model  object  dynamics  and  interaction. 
This  is  one  of  our  Hrst  steps  to  add  more  ^ysical  reality  to  our  ap¬ 
plications. 


Advantages 

NPSOFF  provides  many  advantages  to  the  researchers  in  the 
NFS  Graphics  and  Video  Laboratory: 

Q  NPSOFF  allows  an  application  independent  description  of 
graphical  objects.  Objects  can  be  designed  and  maintained  by 
general  purpose  tools.  Collections  of  objects  can  be  built  and 
mared  with  otha  researchers. 

Q  NPSOFF  adds  a  level  of  abstraction  that  greatly  simplifies  ap¬ 
plication  development  Also,  by  having  a  large  collection  of 
common  objects,  developers  can  concentrate  on  how  objects 
should  be  used  rather  than  designing  and  rendering  the  objects. 

Q  NPSOFF  provides  a  simj^e,  object  oriented,  run-time  interface 
to  an  objMt  Functions  such  u  read_objectO,  display_object() 
and  delete.objectO  *11  operate  on  individual  objects  in  memo¬ 
ry.  Msny  functions  are  provided  so  flexible  mmipulations  are 
possiUe. 

QThestend-alone,  reuMblenatureof  NPSOFF  objects  encourag¬ 
es  the  use  of  common  libraries  of  definition  tokens  sudi  u  ma¬ 
terials  and  textures. 

Support  Tools 

The  wide  use  of  NPSOFF  in  our  laboratory  has  led  to  a  variefy 
of  tools  to  aid  in  the  design  and  maintenance  of  NPSOFF  objects. 
These  tools  include:  The  OFF  calculator,  NPSME  -  a  material  edi¬ 
tor,  NPSTE  -  a  texture  editor,  NPSICON  •  a  model  builder  and 
NPSMOVER  -  a  physically  based  design  editor. 

The  OFF  calculator  allows  in  memory  manipulation  of  NP¬ 
SOFF  objects  using  a  simple  command  line  interface.  Using  the 
OFF  calculator,  objects  can  be  transformed  (transformation  apj^ied 
to  all  primitivesX  primitives  can  be  added  to  an  object,  graphical 
objects  (spheres,  boxes,  etc.)  can  be  added  to  an  object  arid  objects 
can  be  concatenated. 

NPSME  is  a  materia)  editor  that  helps  manage  libraries  of  ma¬ 
terials  [26],  It  reads  and  writes  materia)  definitions.  Materia)  defini¬ 
tions  can  te  selected  from  the  library  for  viewing  and  editing.  The 
material  editor  helps  us  to  maintain  a  large  collection  of  materia) 
defmitions  used  by  NPSOFF  objects  in  our  applications.  The  abilify 
to  interactively  design  and  modify  material  deHnitions  is  very  im- 
ponant  to  rapM  application  development. 

NPSTE  is  a  texture  editor  that  helps  manage  libraries  of  NP¬ 
SOFF  texture  definitions  [26].  NPSTE  can  use  images  in  many  for¬ 
mats  as  textures.  Portions  of  an  image  can  be  copM  and  used  as  a 
texture  image.  Textures  can  be  viewed  on  any  WSOFF  object  us¬ 
ing  either  the  texture  coordinates  specified  in  the  object  or  automat- 
icidly  generated  coordinates  using  the  GL  function  texgtii)  [14]. 
Textures  can  be  edited  using  a  simple  pixel  editor.  Finally,  a  texture 
defmition  can  be  saved  in  a  library  of  textures  and  the  lilnary  saved 
as  an  NPSOFF  file.  The  texture  editor  lets  developers  interactively 
aeate,  select  and  view  textures  independent  of  a  developing  appli¬ 
cation. 

NPS1CX)N  is  an  interactive  object  design  tool  [10],  NPSICON 
lets  a  developer  design  or  modify  NPSOFF  objects  using  a  set  of 
predefined  building  blocks.  NPSICON  is  designed  to  be  u^  pri¬ 
marily  to  build  vehicular  models.  Objects  can  be  edited  and  trans¬ 
form^  in  many  way  s  and  then  saved  to  an  NPSOFF  file.  NPSICON 
allows  rapid  prototyping  of  vehicular  objects  for  use  in  q;)pli';a- 
tions.  It  also  ^ows  ^velopers  to  modify  existing  models  quickly 
and  easily. 

NPSMOVER  provides  an  environment  for  users  to  design  and 
lest  physical  dynamics  of  NPSOFF  objects  [8].  NPSMOVER  reads 
any  NPSOFF  file  and  assigns  default  physical  characteristics  if  not 
present.  The  user  can  then  adjust  all  physical  characteristics  of  the 
object.  Forces  can  be  defmed  and  added  to  an  object's  force  list  us¬ 
ing  iiueractive  controls.  Once  the  object's  initial  conditions,  con¬ 
straints  and  characteristics  are  set  and  the  forces  acting  on  the  object 
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m  s^ln^  Uw  dynamics  can  be  “turned  on”.  The  user  can  ob- 
:^e  Ae  ^fMts  of  the  forces  and  make  necessary  adjustments. 
Oim  die  iisa  is  satisfied,  the  object  can  be  saved  to  an  NPSOFF  file 
die  needed  tokens.  The  NPSMOVER  tool  provides  a  sim¬ 
ple  interacdve  environment  to  view  and  adjust  an  object's  basic  dy¬ 
namic  l^avidr. 

NPSOFF  Future  Directions 

Current  ttid  future  projects  at  NFS  are  woddng  to  extend  and 
improve  NPSOFF  inchiding  support  for  defining  inter-object  rela- 
doitthi^  and  oonstndnts.  'Dux  would  allow  Ae  compoaile  object 
aducture  to  be  extended  to  where  each  aubobject  hu  physical  prop- 
eitia  ^  idfects  the  behavior  of  the  whole  object,  /dao  Ae  notion 
of  lifted  objects  will  be  explored  m  Ae  context  of  NPSOFF.  This 
will  allbw  Ae  realistic  modelAg  of  such  things  u  vehicle  controls 
(e.g.  aiforaft  sd^  movement  changes  control  surface  which  chang¬ 
es  fi^u  dii  whole  aiierafi). 

Ahothv  area  Aat  future  research  will  address  is  animation  sup- 
port  widiin  NPSOFF.  Support  for  continuously  animated  portions 
of  lui  object  (vehicle  antennae)  or  constraint  management  of  sub¬ 
objects  (doors,  arms,  etc.)  would  be  very  useful  to  our  researchers. 
Such  a  system  would  benefit  from  the  standar  Azation  Aat  NPSOFF 
provides  and  offer  much  more  capabilities  to  developers. 

The  NPSOFF  system  is  object-oriented  m  iu  design  and  use  but 
is  implemented  A  a  non-object  oriented  language.  M^fy  mg  or  ex¬ 
tending  the  current  system  is  time  consummg  and  error  prone.  We 
are  currendy  redesigning  NPSOFF  to  be  truly  object-oriented  and 
implementing  it  A  C'H-,  The  maA  benefiu  of  mov  Ag  to  an  object- 
oriented  implementatAn  will  be  Acreased  extensibility  throu^  in¬ 
heritance  and  polymorphism  and  better  maAtsAability. 

Collision  Detection 

A  earlAr  vers  Ans,  NPSNET  Ad  not  detect  nor  respond  to  vehi¬ 
cle  oollis  Ans,  Without  collision  detect  An  and  response,  Ae  realism 
wu  poor.  Even  wiA  texturing,  environmental  effects  and  realistA 
AokAg  vehkles,  Ae  virtual  wor  A  falls  apart  Ae  first  lime  one  vc- 
hAA  ^ves  through  another.  A  possible  solution  to  this  probAm 
would  be  to  prevent  AlerpenetratAns  by  bouncing  objects  off  of 
each  other  aha  any  contact,  but  this  u  rarely  accurate.  AnoAer  pos- 
sibk  solution  is  A  destroy  the  objecu  Avolved  A  collisions.  A  third 
optAn  u  to  combine  these  two  solutions  along  wiA  varyAg  suges 
of  damage  to  Avolved  objects  depenAng  upon  Ae  physical  charac- 
teristAs  of  the  Avolved  objects.  The  current  version  of  NPSNET 
detects  and  responds  to  coUisAns  between  objects  A  real-time.  De¬ 
tection  is  suffkiently  fast  to  al  Aw  the  time  needed  to  respond  prop¬ 
erly.  Response  time  A  dependent  upon  Ae  level  of  physically  based 
mwAlmg. 

Collisions  with  Fixed  Objects 

The  algorithm  for  collisions  wiA  fixed  objects  constantly 
checks  movmg  vehicles  to  determAe  if  a  collision  has  occurred 
The  position  of  Ae  movmg  vehicle  is  updated  constantly.  Conse¬ 
quently,  as  soon  as  a  vehicle  is  moved  and  its  position  is  updated,  it 
is  checked  for  a  collision.  A  order  to  muntam  a  real-time  speed  Ae 
scope  of  Ae  collision  detection  is  severely  limited  A  collision  wiA 
fix^  objects  u  checked  only  if  Ae  movmg  vehicle  is  below  a 
threshold  elevation.  All  fixed  objects  are  m  some  way  attached  to 
Ae  terram  and  Aus  below  that  threshold  elevation.  If  an  object  is 
below  that  elevation,  NPSNET  runs  through  a  linked  list  of  fixed 
objects  which  are  attached  to  Ae  cunent  gridsquare. 


Collisions  with  Moving  Objects 

A  collision  wiA  oAer  movmg  objects  is  more  complicated 
since  any  oAer  movmg  vehicle  or  object  has  Ae  potential  for  col¬ 
liding  wiA  Ae  vehicle  we  are  checkmg.  The  potential  exists  for 
checkmg  up  to  SOO  vehicles  and  any  of  AeA  expendable  weiqxms. 
Consequently,  Ae  scope  of  Ae  collision  detection  range  has  been 
Umited  A  several  ways. 

As  soon  as  each  vehicle  is  moved  its  position  is  checked 
against  Ae  position  of  Ae  neighbormg  vehicles.  If  the  X  or  Z  posi¬ 
tion  of  any  oAer  vehicle  is  wi  AA 100  meters  of  Ae  checked  v  Aide 
Acn  Aose  two  vehicles  are  sent  to  Ae  second  level  Aeck.  At  Ae 
second  level  check,  Ae  distance  between  Ae  two  vehicles  is  calcu¬ 
lated  If  this  distance  is  less  than  Ae  combmed  radu  of  Ae  two  ve- 
luclet,  Aen  a  collision  has  occurred  and  Ae  third  level  collision 
Aeck  is  done.  A  rudimentary  form  of  ray  tracmg  determAes  Ae  ac¬ 
tual  po  At  of  collision. 

If  worst  case  numbers  are  used  to  determAe  Ae  imidicit  range 
limitations  of  all  vehicles,  it  can  be  Aown  why  this  cull  Ag  is  fairly 
accurate.  Reasonable  speed  limitatAns  of  Ae  various  types  of  vehi¬ 
cles  are  vtod  to  calculate  worst  cases  for  each  (Table  1).  Conse¬ 
quently,  Ae  movement  across  more  Aan  two  gridsquares  wiAin 
one  ten  A  of  a  second,  one  frame,  is  unlikely. 


Table  1:  VEHICLE  MOVEMENT  UMITATIONS 


KPH 

m/sec 

Frames/aec 

m/Frame 

HCII 

■U21 

10 

Hi 

■sa 

10 

mmm 

■£i2l 

rnniim 

JO 

Collision  detection  is  accompUAed  by  determinAg  if  one  ob¬ 
ject's  bound  Ag  sphere  has  interpenetrated  another.  The  radius  used 
A  the  s]Aericd  check  is  Ae  maximum  distance  horn  Ae  center  of 
Ae  object  to  Ae  furthest  outer  vertex.  In  Ae  coUis  An  response  por- 
tAn  of  Ae  system,  Ae  actual  object's  penetration  pomt  is  drter- 
mined.  A  sU^tly  smaller  value  Aan  Ae  actual  radius  of  Ae  object 
is  used  for  Ae  radius.  This  produces  a  more  realistA  collision  pos¬ 
sibility  sAce  it  maeases  Ae  likelihood  of  an  actual  coUisAn  of  Ae 
checked  objecu  and  not  just  Aeir  spheres.  Once  Ae  collision  has 
been  detect^,  Ae  extent  of  damage  and  collision  response  are  de¬ 
termined. 

Collision  Response 

Collision  response  is  handled  by  a  function  which  takes  into  ac¬ 
count  speed  and  angle  of  impact,  mass  of  Ae  objecu  Avolved,  ex¬ 
plosive  potential,  resistance  to  destruction,  moldability  of  the  ob¬ 
jects,  rigidity  and  fabricated  spring  forces  which  determAe  Ae 
bouncAg-off  effect  and  likelAood  of  survivability.  Each  of  these 
factors  is  weighted  A  order  to  provide  as  realistic  an  effect  as  pos¬ 
sible  while  maintainAg  Ae  environment  A  real-time. 

Moving  Objects 

In  Ae  case  where  two  movAg  objecu  impact,  all  of  Ae  physi¬ 
cally-based  modeling  characteristics  of  each  object  must  be  consid¬ 
ered.  The  collision  po  At  must  be  known  to  create  realistic  respons¬ 
es  m  Ae  Avolved  objecu.  The  collision  poAt  determAes  Ae  poAt 
for  any  type  of  bendAg,  crumpling  and  molding.  Moreover,  if  Ae 
pomt  of  collision  is  part  of  a  wall  Aat  is  interconnected  to  several 
oAer  walls  Aen  Aere  will  have  to  be  correspondAg  respoiues  A 
Aose  interconnected  walls.  The  only  way  to  find  Ae  collision  poAt 
is  through  ray  uacing. 
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The  fost  ray  is  shot  firom  the  center  of  a  moving  object  towards 
the  ^t«  of  an  adjacent  object  to  determine  a  possible  point  of  col> 
iision.  This  collision  may  simply  be  between  die  bounding  spheres 
of  the  two  objects  and  not  the  actual  objects  themselves.  Ilie  inter¬ 
section  between  the  first  ray  and  the  second  object's  bounding 
^efe  is  used  to  specify  the  direction  of  a  second  ray  originating 
firom'the  adjacent  object's  center. 

I^e  second  ray  determines  if  one  of  the  object's  actual  poly¬ 
gons  was  praetrat^.  This  second  ray  is  the  ray  used  in  Haines'  id- 
gorithm.  This  algorithm  from  Glassner  [4]  was  adapted  for  use  in 
the  collision  point  determination.  It  involves  running  through  the 
list  of  polygons  that  comprise  the  adjacent  object  and  determining 
if  the  second  ray  intersects  the  plane  contairung  the  polygon.  If  no 
intersection  is  found  once  all  of  the  polygons  have  b^n  checked, 
then  only  the  spheres  were  penetrat^  and  not  the  objects  them¬ 
selves. 


Reactions 

The  proper  response  is  performed  by  comparing  the  character¬ 
istics  of  two  objects  involved  in  the  collision.  For  fixed  objects,  the 
responses  include  several  degrees  of  damage,  based  upon  the  speed 
and  mass  of  the  colliding  object.  Up  to  three  levels  of  damage  plus 
the  original  undamaged  fixed  object  are  available  for  display  ^ter 
a  collision.  For  mobile  objects,  the  response  depends  upon  the  angle 
of  impact  as  well  u  the  speed  and  mass  of  the  two  involved  objects. 
The  mobile  object  reacts  by  either  bouncing  away  or  being  de¬ 
stroyed  and  exploding.  In  the  special  case  of  contact  by  munitions, 
the  only  response  is  an  explosion.The  limited  number  of  options 
availatde  for  the  response  to  the  collision  keep  the  response  fast  to 
ir-untain  the  real-time  criteria.  The  collision  point  and  direction  of 
bavel  are  passed  to  another  module  that  handles  physically-based 
modeling  of  object  movement.  This  function's  implementation  can 
be  seen  in  [8]. 

SIMNET  Networking  Integration 

SIMNET  networking  integration  is  part  of  our  NPSNET  efforts 
on  software  structures  for  world  modeling  in  that  networking  pro¬ 
vides  the  locations  and  actions  of  other  players  in  our  visual  simu¬ 
lators.  We  use  Ethernet  and  TCP/IP  multicast  packets  of  our  own 
design  for  the  cunent  NPSNET  system.  We  ate  in  the  process  of  in¬ 
tegrating  the  networking  system  with  the  SIMNET  standard  pack¬ 
ets  as  the  full  description  and  documentation  is  now  available.  This 
connection  to  SIMNET  will  provide  players,  weapons  firing  and 
other  state  information  with  which  we  can  test  our  world  modeling 
efforts.  At  a  later  stage,  we  hope  to  examine  some  of  the  available 
work  on  higher  speed  networks,  such  as  FDDI,  as  it  becomes  com¬ 
mercially  available  and  relevant. 

NPSNET.HARNESS  Structure 

The  NPSNET-HARNESS  process  was  developed  to  allow  the 
rapid  integration  of  different  components  into  the  NPSNET  simula¬ 
tion  system  and  in  partial  response  to  Ethernet's  speed  and  address¬ 
ing  li^tations  [IS].  The  high  level  structure  of  the  network  harness 
is  shown  in  Figure  4.  The  harness  is  divided  into  two  main  sections, 
the  Network  Daemon  and  the  User  Program  Interface,  which  com¬ 
municate  via  shared  memory.  The  principle  purpose  of  the  Network 
Daemon  is  to  provide  low  level  data  and  network  management  sup¬ 
port  for  user  written  NPSNET  “player"  programs.  Player  programs 
developed  by  users  are  stand  alone  applications  that  provide  specif¬ 
ic  world  interaction  functionality. 

The  User  Program  Interface  consists  of  a  set  of  routines  that  al¬ 
low  the  ]70grammcr  to  interact  with  the  network  at  a  higher  level  of 
abstraction.  These  functions  include  setting  up  the  shared  memory 


space  with  the  network  daemon,  creation  of  a  network  read  key, 
message  formatting,  and  the  actual  reading  and  writing  of  network 
messages. 


Message  Types 

One  of  the  interesting  things  about  the  Ethernet  network  is  that 
it  is  more  efficient  to  have  a  few  long  messages  rather  then  many 
short  messages[16).  This  influenced  the  creation  of  five  message 
types  and  formats. 

The  message  types,  NEWSTATMESS  and  DELSTATMESS, 
are  used  when  a  a  station  enters  the  network  and  when  it  no  longer 
is  an  active  player  in  the  networked  environment.  These  are  u^ 
solely  as  administrative  messages  and  do  not  affect  the  qrpearance 
of  any  vehicle. 


User  Program 


Networic  Send 


Interrupt 


Message  Request  for 

Data  Message 


Network  Read 

i 


Shared  Memory 


Message 

Data 


Outgoing 


leue 


Message  Data*^* 

1 - 

1  ^^^Tietwork  Daemon 

Seij 

tiwork 

iMessage 

'rocest 

Recete 

Procei 

k 

isage 

Message  Message 

Figure  4  Structure  of  the  NPSNET  Network  Hfuness 


One  of  the  features  of  NPSNET  is  the  capability  of  allowing  the 
user  to  change  vehicles  during  the  execution  of  the  simulation.  The 
SWrrCHMESS  notifies  all  the  other  nodes  on  the  network  that  the 
user  has  changed  vehicles.  This  does  not  affect  the  ^pearance  of 
any  of  the  vehicles. 

The  UPDATEMESS  is  the  largest  message  used  in  NPSNET 
and  it  is  also  the  most  common,  accounting  for  almost  all  the  net¬ 
work  traffic.  Before  we  discuss  this  message,  the  concept  of  the 
state  of  the  vehicle  must  be  covered.  As  mentioned  previously,  the 
vehicle's  position  is  updated  only  after  a  speed  or  direction  chwge. 
The  till  arid  roll  of  the  vehicle  can  be  derived  from  the  location  on 
the  lenain  and  need  not  be  sent  across  the  network.  Additionally, 
the  orientation  of  the  tunet,  the  gun  elevation,  vehicle  destruction, 
and  weapons  fuing  all  change  the  state  of  the  vehicle.  Whenever 
any  of  these  stale  parameters  change,  a  message  must  be  sent  to  up¬ 
date  the  other  network  nodes. 

Since  it  is  more  efficient  to  have  a  few  long  messages  rather 
then  many  short  ones,  we  combined  all  of  the  vehicle  state  parame¬ 
ters  into  a  single  message.  This  has  the  additional  benefit  of  updat¬ 
ing  all  of  the  vehicle  parameters  at  the  same  time  to  ensure  accurate 
placement  and  orientation  of  the  vehicle. 
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NPSNET-HARNESS  Future  Directions 

Currently  there  are  two  major  efforts  underway  concerning 
NPSNET-HARNESS.  The  first  of  these  is  the  porting  of  the  system 
to  Sun  SPARC  woricstations.  We  envision  providing  the  user  a  stan¬ 
dard  network  interface  for  both  the  IRIS  and  Sun  workstations.  This 
will  allow  the  development  of  Autonomous  Agents  (AA)  and  Semi- 
Automated  Forces  (SAF)  that  can  interact  with  the  vehicles  that  are 
driven  on  the  IRIS  workstations.  Our  lOfif  departmental  Sun  work¬ 
stations  would  then  serve  as  a  distributed  multiprocessor. 

The  second  major  effort  is  the  utilization  of  the  SIMNET  Pro¬ 
tocols  [1 1].  As  shown  in  Figure  S,  we  plan  on  constructing  an  inter¬ 
face  between  the  User  Program  and  the  Network  Daemon  to  con¬ 
vert  the  format  of  the  protocols  between  the  internal  and  external 
protocol.  This  will  later  be  extended  to  the  DIS  Protocols  [S]  as 
well.  The  use  of  a  translator  will  isolate  the  programma  fiom 
changes  in  the  protocols.  Naturally,  we  will  increase  the  number  of 
messages  avulable  to  the  user  when  we  use  the  new  protocols,  but 
the  old  message  formats  will  remain. 
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Figure  5  Protocol  Interface 


It  is  not  enough  to  have  random  vehicles  moving  about  the  battle¬ 
field  without  a  mission;  we  must  populate  the  battlefield  with  com¬ 
bat  formations  that  act  semi-autonomously  as  well.  The  NPSNET 
Mobility  Expert  System  (NPSNET-MES)  provides  realistic  semi- 
automated  forces  (SAF)  to  introduce  sufficient  numbers  of  un- 
maimed  players  into  the  system  to  make  the  simulation  more  chal¬ 
lenging  and  exciting.  NPSNET-MES  consists  of  two  components: 
a  path  generation  module  and  a  vehicle  controller  module.  The  path 
generation  module  determines  the  SAF  route  and  mission  based 
upon  the  SAF  controller  input.  The  vehicle  controller  module  uses 
the  programmable  harness,  NPSNET-HARNESS,  to  multicast  data 
packets  via  Ethernet  to  control  the  SAF  vehicles  during  the  simula¬ 
tion.  NPSNET-MES  integrates  SAF  into  an  already  existing  net¬ 
work  simulator  such  that  no  changes  are  necessary  to  NPSNET. 

Problem  Description 

One  of  the  major  objectives  of  our  work  is  to  determine  the  best 
approach  to  integrate  semi-automated  forces  into  an  already  exist¬ 
ing  simulation.  The  following  are  the  minimum  c^abilities  of  the 
semi-automated  forces:  The  SAF  controller  specifies  a  path  that  in¬ 
cludes  start  and  goal  points  with  possible  way  points  along  the 
route.  The  SAF  must  negotiate  all  Imown  obstacles  without  hitting 
them  in  a  relatively  optimal  path.  The  SAF  vehicles  within  a  SAF 
formation  must  follow  the  lead  SAF  vehicle  suck  that  they  maintain 
relative  positions  and  do  not  collide  with  each  other.  The  SAF  con¬ 
troller  specifies  the  number  of  combat  formations  as  well  as  the 
number  of  vehicles,  speed  and  type  of  each  combat  formation. 
When  a  SAF  vehicle  is  killed,  it  no  longer  moves.  NPSNET-MES 
integrates  the  SAF  into  the  existing  NPSNET  without  any  change 
to  the  system.  Once  the  SAF  controller  determines  the  SAP  prereq¬ 
uisite  information,  NPSNET-MES  makes  that  information  avail¬ 
able  to  NPSNET  for  use  during  the  simulation.  These  basic  consid¬ 
erations  drive  the  requirements  for  the  NPSNET-MES  prototype 
system. 


Integration  with  NPSNET 

To  get  the  desired  results,  NPSNET-MES  is  designed  to  act  in 
a  stand  alone  mode.  This  means  that  NPSNET-MES  integrates  the 
SAF  into  NPSNET  by  using  the  existing  set  of  programmable  net¬ 
work  harness  routines,  NPSNET-HARNESS.  The  main  problem 
separates  into  two  distinct  subsets:  designing  semi-automated  forc¬ 
es  that  can  navigate  and  travel  a  specified  path  and  transmitting  tlie 
information  generated  by  the  first  part. 

Path  Generation  Module 


Semi'Automated  Forces 

The  cunent  DARPA  SIMNET  system  has  a  semi-automated 
forces  (SA<*;  component  in  it.  The  SAF  system  provides  autono¬ 
mous  players  to  SIMNET  when  sufficient  numbers  of  actual,  inter¬ 
active  players  are  not  available  or  affordable.  The  Graphics  and 
Video  Laboratory  has  considerable  experience  in  generating  such 
players  as  our  visual  simulation  efforts  have  a  close  coupling  to  our 
department’s  artificial  intelligence  and  robotics  efforts  [20,21].  We 
are  continuing  those  efforts  and  expanding  that  work  to  take  advan¬ 
tage  of  the  available  parallel  {xocessing  capabilities  of  our  worksta¬ 
tions. 


This  module  is  a  2D  i. tap/interface  that  the  SAF  controller  uses 
to  perform  SAF  vehicle  placement  and  route  selection.  The  SAF 
controller  is  able  io  control  the  SAF  parameters,  such  as  number  of 
SAF  formakions,  number  of  vehicles  in  each  SAF  formation,  and 
type  of  SAF  vehicles  and  input  a  desired  path  with  intermediate  ren¬ 
dezvous  points  as  well  as  a  speed  for  each  path  segment. 

NPSNET-MES  stores  this  information  in  a  file  available  to  the 
vehicle  controller  module.The  path  selection  aiteria  for  this  mod¬ 
ule  is  not  an  optimal  path,  rather  it  is  a  relatively  simple  path  that  is 
found  quickly.  This  module  generates  a  path  based  on  a  priori  ob¬ 
stacle  information  using  a  circle  world. 


NPSNET-MES:  Overview 

Earlier  versions  of  NPSNET  used  randomly  guided  vehicles  to 
populate  the  battlefield.  These  vehicles  had  very  little  intelligence 
and  were  only  capable  of  firing  back  at  an  attacker  or  running  away. 


Computational  versus  a  pmri  Path  Planning 

The  path  generation  module’s  path  generation  algorithm  uses  a 
modified  breadth-first  search  of  a  bounding  box  rather  than  the 
more  traditional  artificial  intelligence  approach  of  a  priori  generat¬ 
ed  paths  because  it  is  more  efficient  and  less  complex.  The  compu- 
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uUdntd  approach  searches  the  bounding  box,  shooting  a  line  be- 
.  ^  staM  and  goal  to  determine  if  the  goal  is  visible  from  the 

start.  If  the  path  has  an  obstacle,  then  the  path  finder  is  called  recur¬ 
sively  imtil  a  path  is  found  around  obstacles  enroute  to  the  goal.  The 
NPSf^-MES  path  generation  module  algorithm  bounds  the 
SMmh  nea  using  the  start  and  goal  points  to  limit  the  search  within 
a^x. 

An  a  priori  path  generation  produces  paths  for  the  entire  data¬ 
base  requiring  a  longer  amount  of  time  arid  more  memory  to  store 
those  pa^  for  quick  access  than  a  computational  generation.  The 
recursive  path  planner  grows  in  a  linear  fashion  versus  a  non-linear 
growth  for  the  more  traditional  a  priori  method  (Figure  6). 


— 

CompiiUtional  Search 

a  priori  Search 

Figure  6  Computational  vs.  a  priori  Search 

Path  Generation  Module 

The  path  generation  module  is  the  interface  for  NPSNET-MES 
with  NPSNET.  This  program  places  the  generated  paths  in  a  sorted 
linked  list  by  ascending  order  of  time.  A  path  point  time  is  a  running 
total  time  for  the  vehicle  from  the  start  up  to  that  point.  Using  the 
system  clock  to  mainuin  relative  time,  the  paths  are  taken  off  a  pri¬ 
ority  list.  The  NPSNET-HARNESS  sends  updated  messages  re¬ 
flecting  the  new  vehicle  position,  direction,  and  speed  to  NPSNET. 
NPSNET  receives  the  path  data  and  the  SAP  vehicles  respond  to  the 
vehicle  controller  commands  ensuring  that  the  S  AF  vehicles  stay  on 
track  with  the  generated  paths. 

NPSNET-MES  Results 

NPSNET-MES  provides  a  relatively  efficient  solution  to  find¬ 
ing  a  good  path  for  the  SAF  vehicles.  The  path  found  by  the  path 
generation  module  does  not  attempt  to  fmd  the  best  solution  only  a 
good  solution,  since  the  human  that  it  emulates  usually  only  fmds  a 
good  solution  when  conducting  path  planning.  The  vehicle  conuol 
module  provides  the  necessary  interface  between  NPSNET-MES 
and  NPSNET  so  that  the  SAF  forces  travel  as  they  would  in  real 
life.The  system  provides  a  realistic  friend  or  foe  force  on  the  simu¬ 
lation  battlefield.  NPSNET-MES  effectively  integrates  SAF  into 
NPSNET.  This  system  is  a  prototype  for  research,  therefore  it  has 
many  potential  capabilities  that  can  be  added  at  a  later  time. 


Path  Generation  Module  Limitations 

□  No  dynamic  path  planning  for  the  S  AFs  to  react  to  other  play¬ 
ers  during  the  simulation. 

O  Produces  only  one  combat  formation  type  for  the  entire  mis¬ 
sion. 

O  Terrain  slope  considerations  are  not  incorporated  in  the  path 
planning  algorithm. 

The  most  serious  limitation  with  the  system  is  the  inability  of 
the  SAF  to  react  to  other  players  in  the  simulation.  The  SAF  mis¬ 
sions  are  pre-set  before  the  simulation  begins  and  cannot  be  altered 
once  it  commences.  This  was  a  design  draision  made  at  the  outset 
of  the  project.  The  deficiency  can  be  corrected  by  incorporating  a 
local  path  generation  capability  within  the  vehicle  controller  m^- 
ule.  When  a  SAF  comes  within  range  of  an  active  player,  the  vehi¬ 
cle  controller  module  path  generation  function  would  generate  a  lo¬ 
cal  path  around  the  moving  obstacle  and  then  the  SAF  reenters  the 
previous  path  at  the  closest  point.The  path  generation  module  plac¬ 
es  all  foliow-on  vehicles  in  a  column  of  wedges.  This  is  a  good 
movement  formation,  but  there  are  many  occasions  where  other  for¬ 
mations  would  be  appropriate.  This  additional  flexibility  is  possible 
by  giving  the  SAF  controller  some  options  during  his  path  planning 
preparation.  Tenain  slope  considerations  are  not  incorporated  into 
the  path  generation  module  because  the  design  calls  for  a  fast  and 
efficient  path  planner.  Terrain  analysis  requires  mme  computation 
per  path  segment  since  the  path  generator  evaluates  each  path  seg¬ 
ment  terrain  slope  for  terrain  selection. 

Path  Generation  Module  Limitations 

O  Limited  SAF  vehicle  reaction  to  active  simulation  players. 

□  Projected  and  actual  path  plots  deviate  dii'.:  to  clock  speed  and 
network  transmission  times. 

A  design  decision  was  made  early  in  the  design  phase  rejecting 
multiple  reaction  capaUlities.  The  SAF  vehicles  die  when  attacked 
because  NESNETf-MES  no  longer  sends  update  positions  and  re¬ 
duces  the  speed  to  zero.  By  increasing  the  number  of  items  that  the 
vehicle  controller  module  checks  from  the  network,  the  reaction  ca- 
pabili^  is  upgradeable. 

The  fmal  limitation  is  not  a  serious  one  since  deviations  arc 
small  and  the  shifting  movement  is  not  conspicuous.  To  fix  the 
problem,  the  system  must  be  able  to  operate  at  the  milUsectmd  rate 
or  faster  since  the  path  points  are  in  an  ascending  order  queue.  Some 
path  points  may  have  the  same  time  suunp  causing  a  delay  for  at 
least  one  of  the  SAF  vehicles.  NPSNET-HARNESS  is  not  able  to 
operate  faster  than  its  cunent  rate  due  to  hardware  system  limita¬ 
tions.  The  limitations  create  a  bottleneck  because  there  is  only  a  sin¬ 
gle  wire  and  single  port  on  the  Ethernet.  There  will  always  be  some 
enor  due  to  transmission  time  delay,  but  tlus  effect  is  negligible  as 
long  as  the  machines  are  in  relative  proximity. 

Aural  Cues  for  3D  Visual  Simulation 

A  realistic  virtual  world  must  include  aural  cues  about  the  ob¬ 
jects  in  the  world.  These  cues  should  provide  feedback  about  the  us¬ 
er’s  environment  and  actions  taking  place.  A  recent  addition  to 
NPSNET  is  the  support  of  sound  feedback  to  the  user. 

The  addition  of  sound  to  a  complex  virtual  world  is  itself  com¬ 
plex.  Often,  parallel  event  generated  sounds  are  routed  to  sound  de¬ 
vices  which  are  serial  in  nature.  This  imposes  a  severe  limitation 
that  must  be  worked  around. 

One  solution  we  are  investigating  involves  a  process  that  can  in¬ 
telligently  manage  requests  for  sound  issued  from  NPSNET. 


154 


Tim  prpcess  would  have  several  responsibilities: 

.□  Receive  sound  requests,  resolve  multiple  similar  sounds  into  a 
single  sound  that  can  represent  them  and  throw  away  requests 
ofsi^ficantagc, 

Q  Gdordinate  requests  for  continuous  sounds  (e.g.  background 
noise,  otha  vehicular  noise,  etc.), 

Q  Manage  the  me  of  multiple  sound  production  devices  (e.g. 
simply  keyboards,  MIDI  devices,  etc.). 

Q  FacilitiUe  the  use  of  3D  sound. 

This  sound  manager  process  would  allow  NPSNET  to  deal  with 
sounds  in  a  fairly  abstract  manner.  Only  knowledge  of  classes  of 
sotm^  would  li^  to  be  shared  between  NPSNET  and  the  sound 
manager,  this  will  allow  m  to  modify  the  sound  manager  easily 
without  affecting  NPSNET. 

Currently,  sound  support  in  NPSNET  is  limited.  We  use  a  Mac¬ 
intosh  Ilci  timning  in-house  software  to  play  digitized  sound  files. 
The  Macintosh  is  connected  to  an  IRIS  workstation  running  NPS- 
NEt  by  a  serial  link  between  RS-232  ports.  When  NPSNET  wanu 
to  produce  a  sound,  it  issues  a  request  for  a  specific  sound  to  be 
played  by  the  Mac  via  the  serial  port.  The  Macintosh  queues  the  re¬ 
quest,  locates  and  plays  the  sound  in  the  system  resource.  There  are 
several  limitations  to  this  solution: 

Q  NPSNET  mmt  know  specific  sound  names  that  exist  on  the 
Macintosh  and  request  them  by  name. 

Q  Currently  all  sound  files  on  the  Macintosh  must  reside  in  the 
system  folder.  Tius  limits  the  number  of  sounds  that  are  avail¬ 
able. 

□  Only  disnete  sounds  are  currently  used.  There  is  no  notion  of 
continuous  sounds. 

Q  A  single  device  with  one  channel  is  used  to  reproduce  the 
sound.  This  can  lead  to  a  backlog  of  requested  sounds. 

Q  The  queue  of  sound  requests  on  the  Macintosh  can  become 
overloaded  due  to  the  above  backlog.  This  can  result  in  lost 
sounds,  delayed  sounds  or  queue  overflow. 

Ongoing  work  with  sound  and  NPSNET  is  approaching  the 
model  outlined  above.  We  are  beginning  to  investigate  high  quality 
sound  samplers  and  MIDI  devices  atteched  to  the  Macintosh  to  col¬ 
lect,  create  and  reproduce  variom  sounds.  Sophisticated  sound  ed¬ 
iting,  sequencing  and  control  software  on  the  Mac  give  m  many  op¬ 
tions  for  creatively  employing  aural  feedback  in  NPSNET.  Support 
for  3D  sound  is  also  under  research. 

Since  many  sounds  are  object-based,  NPSOFF  objects  will  sup¬ 
port  the  description  and  managemern  of  sound  that  pcruiin  to  them¬ 
selves.  The  sound  control  with  NPSOFF  will  provide  a  standard  use 
of  sounds  and  facilitate  the  collection  of  sound  defmitions  just  as 
we  collect  materials  and  textures. 

We  believe  that  sound  is  an  integral  part  of  any  serious  virtual 
world  simulation.  We  are  actively  pursuing  efficient,  extensible  and 
effective  solutiom  to  integrating  sound  into  NPSNCT. 

NPSNET:  Current  Performance 

The  cunent  NPSNET  system  runs  on  a  variety  of  platforms. 
Our  highest  performance  system  in  the  laboratory  is  the  Silicon 
Graphics,  Inc.  IRIS  240  VGX  with  64MB  CPU  memory.  The  VGX 
system  is  listed  by  the  manufacturer  as  being  capable  of  some  I  mil¬ 
lion  triangles  per  second,  z-buffered  and  Gouraud-shaded.  On  that 
system  with  terrain  texturing  on,  NPSNET  shows  6  frames/second 
with  many  objects  in  the  display  and  9  frames/second  with  few  vis¬ 
ible  objects.  TTie  system  has  a  switch  to  turn  off  texturing  of  the  ter¬ 
rain  and  the  frame  rate  roughly  doubles  respectively. 

The  performance  of  NPSNET  is  not  affected  by  the  addition  of 
the  collision  detection  and  response  modules  as  it  is.  The  response 
time  for  detection  of  fixed  objects  is  adequate  regardless  of  the 


speeH  of  the  moving  objects.  However,  for  collbions  between  two 
high  speed  objects,  collision  detection  is  sometimes  slow. 

Fully  Interactive  and  Detailed  Virtual  Worlds 

While  the  NPSNET  virtual  world  is  not  yet  complete  (and  may 
never  be),  it  is  still  a  consequential  and  somewhat  useful  system. 
The  NPSNET  project  itself  is  a  good  study  of  the  complexity  of 
constructing  3D  virtual  worlds  with  available  commerciiti  technol¬ 
ogy  and  why  fiiUy  interactive  and  detailed  virtual  worlds  are  not  yet 
even  on  the  horizon  despite  media  promises.  We  are  optimistic  and 
hope  that  by  "pushing  the  envelope"  of  real-time,  workstation- 
ba^  virtual  reality,  we  are  finding  a  way  to  reach  the  goal  of  a  ful¬ 
ly  interactive  and  detailed  virtual  world. 
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ABSTRACT 

The  Virtual  Environment  Realtime  Network 
(VERN)  ii  an  object  oriented  testbed  for  the 
interconnection  of  environments  over  a  network 
of  graphical  workstations.  VERN  is  based  on 
extensions  to  the  networking  technology  of  the 
DARPA  sponsored  SIMNET  combined  combat 
training  system  and  the  Distributed  Interactive 
Simulation  protocol  being  developed  as  a  DOD 
standard.  It  allows  for  multiple  participants  to 
interact  in  an  enviromnent,  sharing  ideas  and 
solving  problems,  regardless  of  their  physical 
locations.  Furthermore,  dramatic  reconstructions 
of  historical  evenu  for  education  or  entertainment 
will  be  possible.  Indeed,  much  of  the  impact  of 
VERN  is  likely  to  result  from  the  ability  of 
participants  to  learn  from  each  other  even  if  they 
and  their  machines  are  separated  by  long  distances. 

INTRODUCTION 

Virtual  Reality/Virtual  Environments  (VE)  describes  a 
multi-sensory  real-time  simulation  that  immerses  the 
participant  in  a  multi-dimensional  (usually  3D)  graphical 
space,  allows  fteedom  of  movement  within  the  space,  and 
supports  interactions  including  the  modification  of  most 
features  of  the  space  itself  [10,13].  Additionally,  a  VE 
system  may  include  modeling  tools  for  world 
construction,  rendering  tools  for  viewing,  storage 
mechanisms  for  saving  memorable  experiences,  I/O 
devices  for  controlling  aspects  of  the  space  and 
communication  ports  for  shared  environments. 

Recently,  research  in  the  VE  field  has  now  turned  its 
attention  to  networking  issues  for  shared  experiences. 
Two  phases  must  be  considered  ;  rendering  (distribution  of 
graphical  data)  and  computation  (distribution  of  the 
physical  model).  The  Visual  Systems  Laboratory  (VSL)  at 
1ST  is  cunently  working  on  both  of  these  problems. 

Our  efforts  have  produced  two  software  systems  : 
ANIM  and  VERN.  ANIM  is  an  interactive  graphical 
simulation  system  with  support  for  devices  like 
SpaceBalls  and  gloves  (VSL  Input  Paw).  Modeling  tools, 
such  as  Alias  (high  end  rendering  tool,  Alias  Research), 
MultiGen  (tool  for  CIG  databases,  Software  Systems)  and 
SIOOO  (SIMNET’s  CAD  system,  BBN)  are  used  to  build 
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environments  which  are  inputs  for  the  system.  ANIM  has 
been  extended  using  VERN  protocols  and  can  now  operate 
on  several  computers,  distributing  the  computations  of  the 
objects  as  well  as  distributing  the  space  itself.  This  paper 
will  focus  on  VERN  and  how  systems  like  ANIM  can  use 
VERN  to  distribute  virtual  objects,  computational  load  and 
user  interactions  across  multiple  simulation  platforms. 

Simulation  Network  (SIMNET)  [7,12]  is  a  project 
sponsored  by  the  Defense  Advanced  Research  Projects 
Agency  (DARPA)  and  was  designed  and  built  by  BBN 
Laboratories  Inc.  and  Perceptronics  Inc.  It  allows  for 
collective  team  training  in  combined  arms  scenarios.  All 
of  the  simulators  arc  networked  via  EtherNet  and  the 
communication  model  is  based  on  the  “dead  reckoning" 
paradigm  [8].  VE  applications  are  a  far  more  demanding 
simulation  than  SIMNET,  because  in  a  truly  useful  virtual 
world,  every  object  is  dynamic.  In  traditional  simulators, 
only  a  small  collection  of  moving  objects  can  be 
maintained. 

As  a  follow-on  to  the  homogeneous  SIMNET  system, 
the  US  Army  has  explored  the  possibility  of  expanding 
these  concepts  to  address  the  networking  of  large  numbers 
of  dissimilar  training  devices.  The  next  important  step  in 
this  research  is  the  development  of  a  standard 
communications  protocol  for  Distributed  Interactive 
Simulations  (DIS)  [8]. 

Interactive  simulations  in  the  SIMNET  and  DIS  worlds 
perform  computations  and  communicate  by  a  dead 
reckoning  model.  Each  object  in  the  simulation  has  a  host 
machine  which  will  process  its  dynamics.  All  other 
machines  have  representations  of  the  object  which 
maintain  an  approximation  to  the  current  state  of  the 
object.  The  approximation  of  a  simulation  object's  state 
is  computed  by  a  dead  reckoning  algorithm.  This 
computation  is  usually  an  extrapolation  of  the  object's 
position  based  on  velocity.  When  the  host  object  realizes 
that  the  dead  reckoning  model  has  deviated  significantly 
from  the  dynamic  model  (probably  because  of  user  input), 
an  update  message  is  sent  to  all  other  representations  of 
the  object  on  every  other  machine. 

DESCRIPTION  OF  VERN  vl.2 

VERN  vl.2  was  developed  to  meet  the  needs  of  the 
simulation  community  as  a  vehicle  for  development  of 
networked  environments  as  well  as  to  break  new  ground  in 
the  development  of  interactive  VE  systems..  This 
implementation  is  an  extensible  object  oriented  class 
hierarchy  where  the  communications,  dead  reckoning  and 
process  conUol  arc  abstracted  to  the  highest  levels.  Most 
importantly,  VERN  extends  the  notion  of  dead  reckoning 
into  a  distributed  physical  model. 

VERN  evolved  from  a  non-realtime  Smalltalk-80 
prototype  [2,3,4],  Version  1.2  is  implemented  in  C+-t-  and 
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cunently  runs  on  Silicon  Graphics  and  Sun  Sparc  UNIX 
systems. 

The  communications  protocol  forms  the  software 
basis  for  an  environment  that  will  support  experiments 
with  a  network  of  visual  simulators  operating  in  a  single 
simulation.  This  environment  will  contain  dynamic  and 
static  objects.  For  example,  tenain  over  which  objects 
move  may  be  dynamic  while  buildings  in  a  city  may  be 
static.  Objects  in  the  simulation  communicate  with  each 
other  without  having  to  know  the  host  machine  on  which 
the  receiving  object  resides.  Each  object  assumes  that  all 
objects  are  in  its  own  local  memory.  Under  the  VERN 
protocol,  messages  bound  for  remote  objects  are 
intercepted  and  routed  accordingly. 

Players  and  Ghosts 

Each  real  world  object  participating  in  the  simulation  is 
represented  by  a  software  object  called  a  Player..  The 
Player  resides  on  the  object's  home  machine.  If  human  or 
external  input  is  required  by  the  Player,  the  data  is  read  and 
processed  on  the  Player's  home  machine.  The  main 
responsibility  of  the  Player  is  to  accurately  maintain  state 
information,  read  and  process  inputs,  provide  feedback 
usually  in  the  form  of  real-time  graphics,  and  inform  the 
network  of  any  significant  state  changes  that  deviate  from 
the  dead  reckoning  model. 

In  order  to  facilitate  communication  between  Players 
residing  on  separate  machines,  each  Player  has  an 
associated  Ghost  located  on  every  machine  involved  in  the 
simulation.  Thus  in  an  N  Player  simulation  on  M 
networked  machines,  each  machine  is  guaranteed  to  have 
exactly  N  objects  representing  all  players.  Such  a 
configuration  allows  Players  to  communicate  locally  with 
any  other  Player  (represented  by  its  Ghost).  It  is  the 
responsibility  of  the  Ghost  either  to  respond  directly  to 
the  message,  or  to  forward  it  to  the  actual  Player. 

G'nosts  are  approximations  nf  their  associated 
Players.'  That  is,  the  state  of  a  Ghost  is  not  always  as 
precise  (algorithmically)  as  the  Players,  but  this 
approximation  i.s  adequate  for  visualization  and  dynamics. 
All  Ghosts  that  are  associated  with  a  single  Player  are 
synchronized  at  any  given  instant  in  simulation  time 
through  the  use  of  the  system  clock,  message  passing  and 
dead  reckoning.  When  the  Player  realizes  that  its  Ghosts 
are  going  to  be  inaccurate,  the  Player  then  communicates 
the  correct  state  information  to  all  Ghosts. 

Message  Types 

There  are  two  types  of  messages  to  which  Players  and 
Ghosts  respond;  queries  and  commands.  (Queries  are 
messages  which  can  be  processed  entirely  by  the  Ghost. 
Commands  are  messages  that  must  be  passed  on  to  the 
Player.  Thus,  a  message  that  requests  state  information 
would  be  considered  a  query  while  a  change  of  behavior 
message  would  be  a  command. 

Class  Hierarchy 

VERN  vl.2  was  designed  using  the  object  oriented 
paradigm.  The  classes  tf.at  comprise  the  highest  levels  of 
the  hierarchy  contain  the  code  for  handling  all  of  the 
communications  and  process  control  protocols.-  This 
hierarchy  is  considered  a  white  box  framework  [6]  because 
the  user  (programmer)  of  the  system  must  follow  the 
structures  that  the  abstract  classes  establish.  Figure  1 
shows  the  abstract  class  hierarchy  of  VERN  vl.2. 


Object 


AbstractVERNObject  Router  AbstractState 


AbstraetPlayer  AbstractGhost 

Figure  1.  Class  Hierarchy  for  VERN  vl.2 

There  are  additional  classes  not  shown  here  which 
represent  communication  support  structures  such  as 
mailboxes,  addresses,  and  sockets.  The  following 
describes  each  of  the  abstract  classes. 

class  AbstractVERNObject 

This  class  contains  the  virtual  methods 
which  handle  actions  to  be  performed  in  each 
simulation  loop.  For  example,  initializations, 
maintenance  of  the  local  mailbox  (repository  for 
messages),  and  access  to  the  sute  information. 

class  AbstraetPlayer : 

This  class  defines  the  basic  components  of 
the  simulation  Player.  Virtual  methods  in  the 
class  are  used  to  support  such  activities  as 
processing  of  incoming  messages,  internal  state 
configuration  and  message  creation. 

class  AbstractGhost ; 

This  class  defines  the  “view"  of  a  Player  as 
seen  by  other  local  and  remote  Players.  A 
simulation  Player  located  on  a  workstation  can 
communicate  with  another  Player  only  through 
its  AbstractGhost.  An  instance  of  this  class 
contains  limited  state  information  which  is  useful 
to  other  Players.  When  the  state  of  the  Player 
changes  significantly  from  the  dead  reckoned 
state,  a  message  is  sent  to  all  AbstractGhosts  to 
reflect  the  new  value. 

class  AbstractState; 

This  class  defines  the  state  variables  used  in 
the  AbstraetPlayer.  Each  implementation  will 
inherit  from  this  class  and  use  it  as  a  guide.  The 
class  AbstractVERNObject  has  an  instance  of 
AbstractState  as  one  of  its  instance  variables. 

IMPLEMENTATION  DETAILS 

To  write  a  Player/Ghost  program,  the  programmer  must 
create  concrete  subclasses  of  the  abstract  classes  listed 
above.  For  example,  consider  the  definition  of  a  moving 
ball.  The  classes  that  must  be  created  are 
MovingBallPlayer  (subclass  of  AbstraetPlayer), 
MovingBallGhost  (subclass  of  AbstractGhost),  and 
MovingB  alls  tale  (subclass  of  AbstractState).  These  new 
classes  must  then  be  compiled,  linked  and  executed. 
Further  examples  of  Players  may  be  found  in  [2]. 

The  first  classes  that  must  be  created  is  a  subclass  of 
Abstract  Player  and  AbstractGhost.  There  are  two  methods 
that  must  be  reimplemented  in  the  new  Player.  These  are 
processMsg  and  compute^'^xtState.  Tlie  Player  must  also 
have  a  constructor  method  to  create  instances. 


Method  :  constructor 

The  purpose  of  constructor  methods  in  C-H- 
is  to  provide  a  default  way  to  instantiate  new 
instances  of  a  class.  In  our  case,  a  string 
containing  the  name  of  the  Player  is  the  required 
parameter.  The  main  function  of  the  constructor 
is  to  initialize  the  state  instance  variable. 

Method  :  processMsg 

Since  C-i-f  does  not  internally  support 
machine  to  machine  communications,  a  low  level 
messaging  system  is  necessary.  Support  for 
sending  raw  packets  of  data  between  UNIX 
processes  has  been  supplied.  The  responsibility 
of  creating  and  interpreting  the  raw  data  is  left  to 
the  Player. 

The  purpose  of  processMsg  is  to  interpret 
and  respond  to  incoming  messages.  It  is 
important  to  note  that  messages  may  anive  from 
many  different  Players.  Each  raw  message 
contains  the  source,  destination,  data  and  type. 

Method  :  computeNcxtStatc  (for  Player) 

This  method  serves  two  purposes.  Hie  first 
is  to  perform  any  internal  processing  which 
might  be  required  by  the  Player.  For  example, 
calculate  new  position  and  velocity  based  on 
current  simulation  time.  The  second  purpose  of 
this  method  is  to  update  the  state  information  of 
the  Player. 

Method  :  computeNcxtStatc  (for  Ghost) 

The  objective  of  this  method  is  to  compute 
the  Ghost's  approximate  state  model.  The  Ghost 
determines  the  next  state  of  the  player,  without 
any  additional  information  coming  from  the 
player.  This  is  how  dead  reckoning  is 
implemented  within  VERN.  Each  Ghost  performs 
this  message  once  each  simulation  loop. 

In  order  to  facilitate  complete  freedom  in  deHning 
state  information  of  a  Player's  object,  an  AbstractState 
was  created.  This  abstract  class  provides  default 
definitions  of  methods  that  must  be  reimplemented.  It 
defines  no  instance  variables.  This  means  that  the 
concrete  Player  class  must  defme  and  maintain  all  of  its 
own  instance  variables.  The  main  methods  in  this  class 
are  comparison  operators  such  as  ==  and  !=,  mathematical 
operators  such  as  and  •,  and  tl>e  assignment  operator  =. 
Hiere  are  no  other  restrictions  placed  on  the  addition  of 
subclassses. 

EXECUTING  THE  SIMULATION 

Previous  versions  of  the  VERN  used  a  synchronized  clock 
as  the  simulation  coordinator.  Using  this 
synchronization  system  enabled  the  state  of  the  Player  to 
know  (via  a  local  dead  reckoning)  the  Ghost's  exact  state 
at  every  tick  of  the  clock.  Although  this  is  important,  it 
can  be  accomplished  using  the  computers’  real-time 
clocks.  This  allows  each  computer  to  execute  as  fast  as 
possible  and  it  also  reduces  the  communications  overhead 
of  clock  maintenance. 

The  function  of  the  Router  is  to  maintain  the 
connections  to  the  outside  world,  maintain  a  list  of  active 
local  and  global  Players,  and  route  messages  according  to 
their  source  and  destination.  All  of  the  routers  know  the 
locations  of  the  other  routers  and  the  addresses  of  all 


objects.  This  global  information  allows  the  router  to 
make  decisions  about  the  direction  of  the  message.  The 
Router's  main  loop  asks  each  of  the  local  objects  to  run 
one  simulation  cycle.  During  this  cycle,  objects  execute 
the  inherited  meAods  above. 
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Figure  2.  Process  Architecture  of  VERN  v).2 


Additionally,  the  Router  can  report  the  current 
simulation  configuration  and  detect  simulation  errors. 
When  a  Player  leaves  the  simulation,  the  Router 
immediately  realizes  which  Player  is  missing  and  then 
reports  this  to  all  Routers  in  the  system.  Figure  2  shows 
the  overall  system  architecture  of  VERN  vl.2. 

There  are  additional  system  functions  worth 
mentioning.  An  automatic  update  is  has  been  added.  This 
forces  the  Player  to  update  its  Ghost  at  a  specified  interval 
(usually  3-'>  seconds)  even  if  no  update  is  needed.  This 
function  is  useful  when  the  communication  system  drops 
packets.  Using  this  function  provides  for  reliable 
Player/Ghosi  synchronization. 

An  additional  system  parameter  is  called  “dynamic 
update."  Dead  reckoning  algorithms  have  a  base  threshold 
on  which  an  update  is  based.  The  dynamic  update  is 
another  threshold  which  provides  the  user  with  some 
control.  The  dynamic  update  threshold  specifies  the 
amount  of  enor  in  the  dead  reckoning  algorithm.  For 
example,  if  the  user  is  interacting  with  the  environment  at 
a  detailed  level,  then  the  dynamic  update  will  be  set  to  a 
small  value,  resulting  in  accurate  synchronization  between 
Player  and  Ghost. 

One  last  feature  is  called  “update  tracking."  When  a 
Ghost  receives  an  update  message  from  the  Player,  usually 
the  position  has  changed  significantly.  If  the  update 
tracking  is  set  to  "jump",  then  the  object  will  disappear 
from  its  current  location  and  reappear  at  its  updated 
location,  causing  a  visual  disturbance.  If  the  update 
tracking  is  set  to  smooth,  then  the  object  will  track 
evenly  to  its  new  posiuon.  This  tracking  will  occur  over  a 
number  of  frames  and  the  amount  of  smoothing  can  be  set 
as  a  system  parameter. 
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ISSUES  FOR  DISCUSSION 

There  are  many  issues  that  arise  in  research  projects  of  this 
nature.  It  is  useful  to  note  that  VERN  v  1.2  is  only  one  part 
of  a  larger  project  to  develop  VEs,  and  its  main  purpose  is 
to  show  proof  of  concept.  Below  are  a  few  interesting 
topics  that  emerged  from  this  implementation. 

Communications 

The  base  communications  between  different  machines 
is  accomplished  with  a  “broadcast”  UNIX  socket. 
Broadcast  sockets  distribute  their  packets  to  anyone 
listening.  Socket  communications  using  the  broadcast 
mechanism  are  not  guaranteed:  messages  sent  may  not 
reach  their  final  destination.  Additionally,  broadcast  is 
convenient  for  local  communications  but  may  not  be 
useful  in  long  haul  systems.  By  experiment,  the 
performance  advantages  of  broadcast  messages  outweigh 
the  risk  of  occasional  lost  messages.  Point-to-Point 
sockets  are  the  main  means  for  long  haul  communication. 
VERN  has  been  tested  over  private  communication  lines  as 
well  as  on  the  nation-wide  Internet. 

It  should  be  noted  that  the  lowest  level 
communications  were  written  in-house.  There  was  a  study 
of  language  based  communications  systems,  such  as  those 
that  support  TimeWarp  and  Actor  [S,l].  It  was  determined 
that  these  systems  are  useful,  but  our  need  to  learn  and 
experience  workstation  based  communications  outweighed 
their  use.  Implementing  VERN  using  an  Actor  or 
TimeWarp  paradigm  is  possible  and  is  part  of  our  future 
research. 

Object  Oriented  Design  Using 
This  is  probably  one  of  the  most  interesting  parts  of  the 
VERN.  One  of  the  main  arguments  against  the  use  of  C-h- 
as  the  base  language  for  the  VERN  is  that  is  does  not  fully 
support  polymorphism.  Dynamic  binding  of  method  calls 
is  restricted  in  some  cases  because  of  the  strict  type 
checking.  Since  the  design  of  this  project  incorporates 
abstract  classes,  a  language  with  flexible  support  of 
dynamic  binding  and  type  checking  would  be  more 
suitable.  Smalltalk  would  be  a  suitable  alternative  and 
may  solve  some  of  these  problems,  but  it  is  not  yet 
available  on  a  wide  variety  of  workstations. 

Performance 

The  performance  of  VERN  has  been  measue'd  and  a  deutiled 
description  of  experiments  can  be  found  in  (31.  Currently, 
running  on  a  network  of  2  Sun  Spares  and  4  Silicon 
Graphics  workstations,  VERN  vl.2  can  achieve  300-350 
franies/second.  The  test  enviromneni  consisted  of  S  balls 
bouncing  in  a  closed  box.  We  have  conducted  extensive 
experiments  on  non-triviat  ..viromnents  and  the  results 
are  encouraging.  We  exfsect  frame  rates  of  5-10  per  second 
with  an  environment  consisting  of  1000  objects  (-lOk 
polygons). 

Future  Directions  of  VERN 

The  major  goals  of  the  next  version  are  to  improve 
efficiency,  investigate  other  d.istributed  simulation 
systems,  experiment  with  extended  environments  and 
continue  work  on  lung  haul  communications. 

The  design  of  tliis  project  represents  only  one  limited 
view  of  VE  system  development.  I'ramewor’ss.  like  VERN. 
need  to  be  combined  with  other  object  oriented  systems  to 
form  complete  VE  systems.  This  project  and  others,  like 
ANIM,  are  likely  to  pave  the  way  to  robust  >ysicms. 
These  new  VE’s  will  contain  physical  modeluig..  teal  iniie 
control  of  objects,  decenirali/ed  clocks  arid  spatial 


division  of  computations  in  an  object  oriented  framework. 
The  next  level  of  research  for  this  project  will  look  at 
these  issues  to  determine  commonality  and  reusability 
which  will  extend  the  functionality  of  the  entire  system. 
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Abstract 

The  Mirage  syatem  ii  an  object>oriented  framework  for  oonitiuct* 
ing  interactive  visual  applications.  It  takes  a  model-based  ap¬ 
proach  to  application  development  by  providing  a  representation 
system  for  graphics,  interaction,  and  time-based  dynamics.  This 
paper  will  provide  a  brief  overview  of  the  architecture,  examples 
of  its  use,  and  a  comparison  to  alternative  approaches. 

1.0  Introduction 

The  goal  of  this  woric  is  to  create  a  foundation  for  animated,  inter¬ 
active  3D  graphics  that  will  reduce  the  time  and  expeitise  required 
to  produce  visual  applications.  Our  hypothesis  is  that  a  model- 
ba^  approach  to  application  development  provides  signifleant 
advanta^s  over  more  conventional  pn^ural  programming  tech¬ 
niques. 

To  lest  this  hypothesis,  we  iuive  developed  a  foundation  for  inter¬ 
active  3D  graphics  that  supports  this  paradigm.  It  combines  eie- 
menu  of  obje^-oriented  programming  and  fiiune-base  knowledge 
representation  to  provide  the  functionality  of  window  systems, 
graphics  systems  and  animation  systems. 

10  Modeling  Methodology 

The  modeling  process  begins  by  describing  elements  of  an  applica¬ 
tion  domain  in  terms  of  the  primitive  elements  of  the  represenu- 
tion  system.  These  new  cluscs  of  objects  are  domain-specific 
primitives  that  map  their  domain  attributes  onto  graphical  at¬ 
tributes.  They  are  ^en  used  to  create  models  in  the  desired  do¬ 
main.  As  the  domain  attributes  change  through  the  evaluation  or 
simulation  of  the  model,  the  effects  propagate  down  to  the  low-lev¬ 
el  graphical  attributes.  The  itsulting  structure  is  then  interpreted  to 
produce  the  desued  visual  presentation. 

In  Mirage,  modeling  is  accomplished  through  the  use  of  a  repre¬ 
sentation  system.  The  purpose  of  the  representation  system  is  to 
provide  a  framework  for  describing  the  eicrrKnts  and  behavior  of  a 
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system  in  a  modular,  declarative  style.  The  developer  creates  a 
i^el  by  manipulating  elements  and  attributes  of  the  repicKnta- 
tion  system.  The  representation  framewodc  then  provides  the  nec¬ 
essary  infrasliucture  to  handle  rendering,  flow  of  control  and  event 
management.  The  intent  is  to  hide  details,  such  u  how  rendering  is 
to  be  done,  and  focus  instead  on  what  the  result  is  to  be. 

The  representation  system  employed  resembles  a  simple  frame- 
based  knowledge  represenUtion  system(6].  The  primary  elements 
are: 

•  classes  and  instances  of  objects, 

•  object  attributes  and  values, 

•  operation?  on  objects,  and 

•  relations  between  objects. 

Models  are  created  by  making  instances  of  objects,  setting  the  at¬ 
tribute  values  of  the  objects  and  composing  the  objects  via  various 
relationships,  such  u  a  "component**  relation. 

The  graphics  repiesentaiion  system  of  Mirage  is  where  an  applica¬ 
tion  specifies  what  is  to  appear  on  the  screen,  while  the  tenderers 
interrogate  the  model  and  perform  the  hardware-specific  opera¬ 
tions  required  to  present  an  interpretation  of  the  model  on  the 
screen,  (see  Figure  1).  The  final  result  of  interpretation  depends 
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Figure.  1.  Representation  and  interpretation 


upon  the  specific  type  of  interpreter  being  used  and  upon  the  re¬ 
sources  available  to  the  interpreter.  For  example,  different  graphics 
inicrprelcn  may  behave  differently  depending  upon  the  specific  ar¬ 
chitecture  of  the  graphics  subsystem  they  ate  using  or  the  degree  of 
realism  desired.  Similarly,  a  (liffetent  type  of  interpreter  may  pro¬ 
duce  an  interpretation  showing  a  part/whole  structure  diagram 
rather  that  the  literal  appearance  of  the  model  elements. 

The  advantages  of  the  modeling  approach  to  graphics  ate: 

•  it  reduces  complexity  via  declarative,  consuuctive  style  of 
usage, 

•  it  supports  a  variety  of  rendering  styles  and  platforms, 

•  it  is  usable  either  by  programming  or  through  knowledge- 
based  or  interactive  tools. 
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3.0  The  Mirage  System 

A  representation  system  for  2D  and  3D  graphical  presentations  has 
been  defined  and  a  prototype  (Mirage)  hu  been  implemented  in 
C-f-f  under  Unix  for  both  the  Silicon  Graphics  Inc.  Graphics  Li¬ 
brary  (SGJ-GL)  and  for  X  Windows/PEX.  This  prototype  has  been 
used  successfully  to  construct  interactive  applications  including 
scientific  visualizations  for  the  Superconducting  Super  Collider 
Laboratory  and  a  virtual  reality,  retail  shopping  system  for  the 
NCR  Corporation.  The  system  consists  of  the  following  elements: 

•  a  graphics  substrate  that  supports  interactive  3D  graphics  in 
a  heterogeneous  networked  environment, 

•  a  temporal  representation  that  allows  the  dynamic  aspects 
of  a  system  to  be  specified,  and 

•  an  event-manager  for  describing  the  cause-effect  behavior 
of  the  system  due  to  user  and  system  interactions. 

3.1  Graphics 

The  representational  framework  for  static  graphics  combines  hier¬ 
archic^  graphics,  object-oriented  programming,  and  frame-based 
knowledge  representation  techniques.  This  framework  has  been 
described  in  more  detail  in  (9,10].  The  class  lattice  of  Figure  2. 

/Camera  mu 
Form  ^  ^Window 

^Ught  9 


In  this  framework,  VTindows,  Cameras  and  Lights  all  inherit  from 
the  Form  class,  so  that  they  may  be  placed  in  a  scene  like  any  other 
graphical  object.  As  a  result,  synthetic  cameras  which  may  be  at¬ 
tached  to  other  objects  as  shown  in  figure  4  are  directly  supported. 


pilot 

Figure  4.  Synthetic  camera  example 


In  this  example,  the  display  on  the  left  shows  an  external  view  of 
the  scene,  while  the  display  on  the  right  shows  the  pilot's  view 
from  the  co>:kpit.  In  a  similar  fashion,  A  Window  may  be  placed 
in  a  scene  and  then  treated  as  a  Form  with  the  difference  being  that 
its  appearance  is  determined  by  the  Cameras  and  scenes  it  is  view¬ 
ing. 
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0d_Shapes  (points,  polnt_sets, ...) 
Id.Shapes  (lines,  polylines, ...) 
2d_Shapes  (polygons.  Images.  ...) 
3d_Shapes  (platonic_solid,  quadmesh, ...) 
Figure  2.  Classes 


The  preliminary  resulu  have  shown  that  having  Camera  and  Win¬ 
dow  be  sub-classes  of  Form  makes  interface  composition  both 
simpler  and  more  flexible  than  with  the  mme  traditional  approach¬ 
es.  This  follows  from  the  fact  that  all  objects  have  a  common  sub¬ 
set  of  attributes  and  behaviors,  and  there  are  very  few  constraints 
on  how  they  can  be  combined.  The  result  is  a  simple,  consistent 
model  of  how  graphical  scenes  are  described. 


presents  the  graphical  classes.  The  class  Form  represents  objects 
with  spatial  attributes  such  as  location,  scale,  orientation  and 
shape.  Each  Form  defines  a  local  coordinate  system  in  space. 
Forms  may  be  combined  hierarchically.  Forms  by  themselves 
have  no  direct  appearance  but  instead  have  a  shape  attribute  which 
may  be  filled  by  one  or  more  instances  of  a  Shape  sub-class. 
Figure  3.  shows  how  instances  of  these  classes  can  be  used  to  cre¬ 
ate  a  visual  presentation.  From  the  top  down,  an  insumce  of  class 
Window  is  viewing  the  “world”  through  an  insutnee  of  class  Cam¬ 
era.  The  “world”  is  an  aggregation  of  Forms  where  the  part-whole 
structure  is  defined  by  the  Component  relation.  The  result  of  in¬ 
terpreting  this  structure  is  shown  in  the  upper  right  comer  of  Fig¬ 
ure  3.  as  an  image  on  a  workstation  display. 


Figure  3. 


Furthermore,  by  making  the  represenution  declarative  and  by  en¬ 
forcing  the  separation  between  represenution  and  interpretation,  a 
high  degree  of  display-architecture  independence  and  suppon  of 
multiple  display  presentations  are  possible.  The  resulting  architec¬ 
ture  appears  well  suited  to  supporting  dynamic,  interactive  graph¬ 
ics  in  a  networked  environment. 


3.2  Animation 

Tune -dependent  behavior  is  represented  using  a  framework  simi¬ 
lar  to  that  used  in  the  graphics  sub-system.  The  dynamic  behavior 
of  objects  is  defined  by  consuuciing  a  graph-based  representation 
of  temporal  objects.  The  nodes  in  the  graph.  Activity  (see  Figure 
S.).  represents  objects  with  a  temporal  extent,  that  is,  one  that  ex¬ 
ists  over  some  interval  of  time.  An  Activity  also  defines  a  tempo- 
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Figure  5.  Temporal  classes 

ral  coordinate  system  The  actual  behavior  of  an  Activity  is 
specified  by  specialiring  ihe  Activity  class  and  expressing  the 
time -dependent  behavior  prove  ! 'i.dly  for  ihe  new  sub-class  For 
example,  Figure  6a  shows  Ume  liemg  mapped  onto  ihe  rolation  at- 
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bibute  of  a  wheel  to  that  u  time  goei  from  U  to  1.  the  wheel  ro¬ 
tates  360  degrees.  This  behavior  is  represented  by  the  Activity 
“rotating-whMl**.  Similarly.  Figure  6b.  shows  time  being  mspped 
onto  translation  of  the  whMl,  to  produce  the  effect  of  the  wheel 
moving  a  distance  equal  to  its  circumference  over  1  unit  of  time 
(“moving-wheer  activity). 

Next,  a  hierarchical  temporal  coordinate  system  is  introduced  in 
which  each  node  in  the  graph  corresponds  to  a  temporal  activity  of 
some  duration  and  which  scu  as  s  time-frame  for  all  sub  activities. 
Complex  activities  are  then  created  by  composing  simpler  sub-ac¬ 
tivities  under  a  parent  activity.  The  result  is  that  Activities  may  be 
composed  hierarchically  in  the  same  way  that  graphical  objects 
are.  Figure  6c.  shows  the  result  of  combining  the  rotating-wheel 
Activity  with  the  moving-wheel  Activity  to  deflne  a  new.  more 
complex  ‘VolUng-wheer  Activity. 

An  Activity  defines  a  orte-dimensional  coordinate  system  for  time. 
Activities  can  be  “scaled"  and  "translated"  in  .nc  in  much  the 
same  wsy  that  the  graphical  clemenu  (Forms)  are  manipulated  in 
space.  S^ing  an  Activity  affects  the  rate  and  duration  of  the  Ac¬ 
tivity.  while  translating  an  Activity  defines  when  the  activity  will 
occur  relative  to  iu  parent's  time-frame. 

Figure  7.  shows  three  variations  of  the  "Rolling- Wheel"  Activity. 
In  the  Figure  7a..  the  wheel  rolls  one  complete  turn  in  one  unit  of 
time.  In  Figure  7b.,  the  Rolling-Wheel  Activity  has  been  scaled  by 
0.S  relative  to  iu  parem  activity  (not  shown)  and  as  a  result,  occu¬ 
pies  one  half  the  time  as  before  and  therefore  rolls  twice  as  £ssi.  In 
the  third  case.  Figure  7c.,  each  of  sub-activities  (Rotating-Wheel, 
and  Moving- Wheel)  have  been  scaled  by  0.3  and  the  Moving- 
Wheel  activity  has  been  translated  O.S  time  units.  The  behavim 
here  is  that  during  the  first  O.S  time  uniu  of  the  Rolling-Wheel  Ac¬ 
tivity,  the  wheel  rotates  one  complete  turn,  and  then  during  the  sec¬ 
ond  O.S  time  units,  the  wheel  moves  a  distance  equal  to  its 
circumference 

The  behavior  of  an  Activity  can  be  described  using  a  simple  state- 
machine  (see  Figure  8.).  'Ihe  transitions  between  states  is  deter¬ 
mined  by  the  change  in  local-time  between  samples  and  invoke 
methods  on  the  Acuvity.  For  some  types  of  activities,  the  methods 
triggered  by  stale  transitions  may  be  no-ops 
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Figure  8.  Activity  state  transition  model 


The  behavior  of  an  Activity  is  procedurally  specified  through  the 
following  methods: 

•  Start  -  Begin  execution  of  activity,  create  and  initial- 
ixs  any  resources  required  for  the  activity. 

•  Update  •  Advance  activity  to  new  local  time. 

•  Terminate  -  Finish  activity  and  release  any  resources  no 
longer  needed. 

•  Reret-  Reset  activity  to  initial  state. 

This  framework  allows  time  to  flow  forwards  and  backwards,  be 
reset  to  an  earlier  time  or  jump  to  a  later  time,  and  allows  a  variety 
on  non-linear  lime-warps  (7)  to  be  applied  to  Activity  sub-graphs 
providing  effecu  such  as  slow-in-slow-out  dynamics. 

As  indicated  in  Figure  S.,  the  class  Activity  may  be  specialized  in 
various  ways  to  create  different  types  of  time-based  behaviors.  In- 
sUuices  of  the  various  classes  may  be  combined  within  a  single 
Activity  suucuire  allowing  some  dynamics  to  be  produced  via  ex¬ 
ecution  0,  scripts  while  other  dynamic  behaviors  to  be  controlled 
by  procedural  simulations. 

Oibbs(4|  proposes  a  sumlar  framework  for  audio  and  video  media 
which  suggests  that  this  ftamewoik  may  be  appropriate  for  com¬ 
bining  animation,  interactive  simulation,  and  other  media  types. 


3.3  Events 

The  third  part  of  the  system  is  the  Event  Manager.  Events  may  be 
generated  by  the  user  through  interactive  devices  or  within  the  sys¬ 
tem  itself.  A  rule-based  fiamewoik  is  used  to  describe  how  events 
are  to  be  interpreted  and  what  actions  are  to  be  performed  in  re¬ 
sponse  to  the  various  events. 

In  the  Event  Manager,  the  four  primary  classes  are  Events,  Propos¬ 
ers,  Actions,  and  Contexts.  Events  are  the  mechanism  that  com¬ 
municate  the  description  of  significant  states  that  the  system 
achieves.  Events  occur  within  Contexts  which  provide  a  scoping 
of  events.  Proposers  are  triggered  by  specific  patterns  of  events 
and  either  schedule  Actions,  or  inject  new  Events  into  the  system. 
Actions  manipulate  either  the  underlying  application,  the  graphical 
presentation  or  the  temporal  model  of  the  system. 

The  purpose  of  an  Action  is  either  to  produce  an  interpretation  of 
the  triggering  Event,  or  to  cause  some  function  to  be  performed  in 
response  to  the  Event.  An  interpretation  of  an  event  or  pattern  of 
events  may  result  in  new  events  being  created  that  contain  the  in¬ 
terpretation.  This  event  frameworic  is  sufficient  to  handle  user  in¬ 
teraction,  and  discrete-event  style  simulation. 

4.0  Related  Work 

There  are  many  papers  in  the  literature  describing  object-oriented 
graphics  systems[l,2,3,S,8].  lypically  these  systems  are  either 
two-dimensional  interactive  systems,  or  three  dimensional  off-line 
animation  or  rendering  systems.  A  recent  system  that  shares  many 
of  the  same  objectives  as  this  woric  is  the  Brown  Animation  and 
Graphics  System.  BAGS  (II].  Both  BAGS  and  Mirage  attempt  to 
replace  the  traditional  meeting  /  animation  /  rendering  pipeline 
with  a  frameworic  suited  to  interactive  applications.  In  doing  so, 
both  systems  build  upon  object-oriented  programming  foundations 
to  produce  systems  that  arc  flexible  and  extensible.  The  systenu 
differ  however,  in  several  important  respects. 

First,  BAGS  deflnes  its  own  delegation-based  language  for  graph¬ 
ics,  while  Mirage  builds  upon  existing,  class-instance  languages 
such  as  or  CLOS.  lA^ile  the  delegation  approach  used  by 
BAGS  provides  a  great  deal  of  flexibility  during  execution,  in  prac¬ 
tice  much  of  the  functionality  can  be  provided  using  more  conven¬ 
tional  languages.  Use  of  a  standard  language  makes  integration  of 
the  graphical  elements  with  the  application  easier,  since  the  inter¬ 
face  elements  and  application  can  be  built  from  the  same  object- 
oriented  programming  language. 

Next,  the  BAGS  designers  have  chosen  to  tie  the  time-dependent 
elements  of  the  system  closely  to  the  graphical  elements.  The  re¬ 
sult  is  to  change  graphical  attributes  such  as  location  or  orientation 
from  simple  values  to  time-dependent  functions.  In  BAGS,  time  is 
a  special  global  variable.  Hierarchical  time  is  not  explicitly  sup¬ 
ported,  and  effects  such  as  localized  time  warps  are  more  diflicult 
to  achieve. 

Finally,  Mirage  provides  a  separate  framework  for  managing 
events  and  actions.  Events  may  result  from  either  user  actions 
(e.g.,  mouse  clicks),  or  &om  within  the  system  (e.g.,  collision  de¬ 
tection,  or  application  event)  'Ihe  event  manager  provides  a  frame¬ 
work  for  interpreting  patterns  of  events  and  scheduling  actions 
which  result  firom  the  events  It  also  provides  a  mechanism  though 
which  interleaved  events  in  multi-participant  systems  can  be  orga¬ 
nized,  and  controlled  to  provide  correct  execution  without  undue 
serialization  of  execution. 


5.0  Conclusions 

In  summary,  the  approach  presented  here  addresses  the  issues  of 
dynamic  graphics  by  defming  a  declarative  representation  system 
for  graphics,  animation  and  interaction.  Mirage  also  introduces  the 
concept  of  windows  as  three-dimensional  graphical  objects  and  a 
hierarchical  framework  for  animation.  By  using  object-oriented 
programming  and  declarative  representational  techniques,  superior 
modularity  and  ease  of  use  are  achieved  as  compared  to  current 
systems.  The  feasibility  of  this  approach  is  demonstrated  through  a 
working  prototype  and  example  applications. 
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ABSTRACT 

Wo  describe  a  method  for  preserving  a  set  of  geomeuic 
constraints  while  interactively  sculpting  a  free-form  B- 
spline  surface.  The  surface  seeks  a  fair  sh^  by  minimizing 
an  appropriate  global  energy  function.  Tbe  user  controls  the 
surface  through  the  creation  and  manipulation  of  geometric 
constraints  sui^  as  interpolated  points  and  curves. 

Wo  represent  the  froe*form  surface  as  a  B-s  tline  surface,  and 
formulate  a  quadratic  deformation  energy  terms  of  this 
basis.  Constraints  are  represented  as  gradients  of  quadratic 
functionals  which  have  a  global  minimum  value  when  the 
constraint  is  satisHed.  These  constraints  are  linear  in  the 
surface  degrees  of  freedom,  and  are  maintained  during  surface 
minimization  by  uansforming  the  constrained  surface 
equations  into  an  unconstrained  system  with  fewer  degrees 
of  freedom. 

Point,  curve,  and  noniiul  cor'^^'raints  are  formulated  wu(. 
reference  to  a  tensor-product  B-spiine  surface.  By  extension . 
formulations  are  af^UerMe  to  any  linearly  blend^  surface. 

1  INTRODUCTION 

We  are  interested  in  developing  an  easy  to  use  modeling 
method  for  building  shapes  with  free-form  surfaces.  In 
conventional  free-form  modeling  schemes  the  user  must 
manage  both  a  large  number  of  control  parameters  as  well 
as  difncult  to  perceive  relationships  between  them  to 
achieve  application  specific  effects. 

The  strategy  we  propose  to  address  this  problem  is  to  And  a 
modeling  technique  that  separates  the  surface  representation 
from  the  surface  modeling  operators.  In  this  approach  one 
modeling  o[  erator  might  modify  many  degrees  of  freedom 
simultaneous  y  to  create  one  highly  leveraged  modeling 
effect.  We  be  \eve  that  interactive  free-form  surface  design 
based  on  energy  ...Inimizing  surfaces  and  geometric 
constraints  can  be  exploited  to  achieve  this  separation. 

Peimission  to  copy  without  fee  all  or  part  of  this  material  is 
granted  provided  that  the  copies  are  not  made  or  distiibuted  for 
direct  commercial  advantage,  the  ACM  copyright  notice  and  the 
atle  of  the  nublication  and  its  date  appear,  and  notice  is  given 
that  copying  is  by  permission  of  the  Association  for  Computing 
Machinery.  To  copy  otherwise,  or  to  republish,  requires  a  fee 
and/or  specific  permission. 
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Energy  minimizing  surfaces  mimic  the  behavior  of 
everyday  physical  objects  providing  the  user  with  a  familiar 
metaphor  for  modifying  shape  with  forces  in  an  intuitive 
manner.  Surfaces  can  be  pushed,  pulled,  and  irriHated  to  get 
desired  shapes.  The  form  of  the  energy  functional 
determines  the  properties  of  the  shape  being  sculpted.  We 
use  a  functional  that  causes  the  surface  to  minimize  its  area 
while  distributing  curvature  over  large  arci.s  to  form  very 
smooth  and  graceful  shapes. 

We  divide  modeling  operators  into  two  classes;  sculpting 
tools  and  geometric  constraints.  Sculpting  tools  arc 
implemented  as  seu  of  forces  such  as  prc.ssure,  springs  and 
gravity  to  produce  qualitative  effects  like  enlarge,  attract, 
and  flatten.  Interacting  sculpting  loads  are  naturally  handled 
by  adding  the  effective  force  vectors  at  each  point  on  the 
surface  into  a  net  force.  Such  surface  modeling  approaches 
have  been  discussed  in  (3,4,21 1. 

In  contrast,  geometric  consuaimts  are  specified  as  analytic 
conditions  which  the  surface  must  satisfy  explicitly.  Such 
constraints,  including  point  and  curve  skinning,  and 
tangency  and  normal  conditions,  allow  precise  control  over 
a  portion  the  surface,  and  arc  therefor  a  means  of  knitting 
free-form  shapes  to  analytic  ^pcs. 

This  paper  deals  with  enforcing  geometric  consuuints  while 
sculpting  on  deformable  surfaces.  Much  of  the  recent  work 
in  consuaint  based  syistems  for  geometric  modeling  have 
concenuated  in  preserving  relationships  between  simple 
parameterized  objects  such  as  lines  and  circles.  These 
efforts  have  been  applied  to  kinematics  (8],  dynamics  based 
animation  [20],  amd  constraint  based  geometric  modeling 
(8,17J.  Previous  wur.k  for  enforcing  consirainis  on 
panunctric  surfaces  and  curves  have  been  based  on  penally 
methods  without  dcfiormablc  surfaces  (1,20),  transformation 
based  constraints  linwtcd  to  the  explicit  degrees  of  freedom 
in  the  surface  repre-seniation  with  deformable  surfaces  [4), 
and  Lagrangiin  consitraints  [19]. 

In  this  paper,  we  resuict  ourselves  to  a  linearly  blended 
surface  (a  tensor-product  B-spline  surface),  and  consider  die 
class  of  geomeuic  constraints  which  are  linear  functions  of 
the  explicit  degrees  of  freedom  of  the  shape  representation. 
Such  constraints  can  be  imposed  on  the  surface  by  a  linear 
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transformation  of  the  constrained  surface  equations  which 
reduces  them  to  a  smaller,  unconstrained  system  in  which 
the  constraints  are  implicitly  satisfied.  It  is  then  possible  to 
perform  other  operations  (such  as  surface  minimization  in 
the  presence  of  applied  sculpting  forces)  on  the  remaining 
surface  degrees  of  freedom  without  violating  the  consu^nts. 

We  show  how  this  technique  may  be  used  to  constrain  any 
parametric  point  on  the  surface  to  remain  at  a  fixed  location 
in  3-spacc,  constrain  parametric  curves  in  the  surface  to 
maintain  fixed  profiles  in  3-spxe  (fixed-parameter  curve- 
skinning),  and  constrain  the  3-spacc  surface  normals  along  a 
parametric  curve.  The  method  is  directly  applicable  to  any 
surface  representation  which  is  a  linear  blend  of  its  control 
parameters.  In  this  paper  B-splinc  basis  functions  arc  used. 

2  DEFORMABLE  B-SPLINES 


A  deformable  surface  is  designed  to  mimic  real  physical 
behavior.  Like  a  physical  surface,  u  deformable  surface's 
deformation  behavior  is  modeled  by  minimizing  u  global 
energy  functional  which  describes  how  much  energy  is 
stored  in  the  surface  for  any  deformation  shape.  The 
deformation  energy  used  in  this  work  is  of  the  form. 


Eilcform^iion  — 
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wlicre  a  ant.'  ^  arc  weights  on  stretching  and  bending. 


This  pnxfiut^es  a  surface  which  tends  to  minimize  its  area  to 
avoid  folding  atcd  to  distribute  curvature  over  large  regions 
to  make  very  graceful  shapes.  The  quadratic  functional  used 
in  this  work  is  made  from  the  linearized  stretching  and 
bending  terms 
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wtierc  w  is  the  surface  sha]X!,  a  contiguous  set  of  points  in 
3-spacc  represented  with  parametric  variables  u  and  v  as 

w  =  w(u,v)  =  (x(u.v),y(u.v),z(u,v)l  with 
Wu  as  shorthand  for  3 w^^u  and 

Wvv  as  shortliand  for  ^  anj 
f  =  f(w,t)  denoting  the  applied  sculpting  forces 
which  are  changed  over  time  t  by  the  user. 

The  above  problem  is  discretized  by  approximating  the 
minimal  surface  shape  w  by  w^*  a  weighted  sum  of 
continuous  shape  functions.  In  this  paper,  we  use  the 
tensor-product  B-spline  basis  as  discussed  by  [Piegl]  for  the 
shape  functions  yielding 
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w(u,v)  a  w''(u,v)  =  £  X  Pij  Ni^,(u)  Nj,p(v) 
i-0 j-0 
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where  Pi  j  arc  the  familiar  [nxm]  grid  of  B-splinc  control 
points  and  Nj.p  arc  the  univariate  B-splinc  basis  functions 
of  or(’:r  p  defined  recursively  as 


Ni.o(u)  = 


1 1  if  u,<u<ui4^i 
'o  otherwise 
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N.j.(c)  =  Ni.p.i(u)  +  “  Ni*i.^,(u)  5 

Ui*p-Ui  Ui*p*t-Uiyl 


where  Ui  arc  the  knots  forming  a  vector  U  =  {uq.  ui, ... 
Uf).  By  convention,  knots  at  the  ends  of  the  B-splinc  arc 
repeated  p^I  times  so  that  a  B-splinc  curve  with  r  knots 
will  have  n  control  points  where  r  =  n  p  -t- 1.  The  range 
of  u  is  limited  to  Up  ^  u  ^  u^p.  In  this  work  B-splincs  arc 
of  3'‘*  order  setting  p  =  3  and  Ni,p  is  abbreviated  as  Ni. 

The  B-splinc  approximation  for  shape  w*'  is  substituted 
back  into  the  original  minimum  principle  yielding  a 
discrete  matrix  minimum  problem 

min  (x^  Ko  X  -  fa  x)  6 

where  the  unknowns  arc  ordered  into  a  vector  as 

x***  *  IPo.o  Po.i  •  •  •  Pm.nl’.  K©  and  Fg  define  tltc  stiffness 
matrix  and  forcing  vector.  These  terms  arc  given  by 

K<,=  I  OjpOifc  +  thraO.dudv  and  ^  a*'^rdudv 
Jo  Jo 


where  ‘I>b  = 


‘I'uu 

rhvv 

2«l>uv 


«!>.= 


d>v 


Oil 

an 

■pu 

and  a  = 

an 

an 

P  = 

Pu. 

and<I)  =  lNo(u)No(v)  No(u)Ni(v)  ...  N„,(u)N„(v)l 
an  ordered  set  of  basis  functions. 

The  minimum  of  equation  6  is  found  by  solving 


KoX  =  fo(w,t) 


8 


Simple  mass  and  damping  effccus  are  added  to  the  surface  as 
Mx+  Bx  +  KoX  =  fo(w,t) 


where  M  =  pi,  B  =  pi,  I  =  Identity  matrix,  and  p  is  a 
mass  density  and  p  is  a  stabilizing  damping  term. 
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These  equations  are  integrated  through  time  by  using  finite 
differences  for  the  temporal  derivatives  which  results  in  a 
matrix  equation  relating  the  shape  at  time  t+At  to  the  shape 
and  sculpting  loads  at  time  t  and  t-At 

KXt+At  =  F(f|,Xt,Xt.At) 

Solving  for  x  generates  the  control  point  locations  used  to 
generate  the  surface  in  equation  3.  The  matrices  K0  and  K 
are  symmetric  and  positive  definite  due  to  the  form  of  the 
selected  energy  functional.  The  local  support  property  of 
the  B'spline  basis  functions  make  Kg  sparse. 

3  CONSTRAINTS  BY  NULL-SPACE  PROJECTION 

An  attractive  way  to  enforce  constraints  on  a  system  of 
equations  is  to  transform  it  into  an  unconstrained  system  of 
equations  with  fewer  degrees  of  freedom.  Wc  do  this  for 
systems  of  linear  equations  by  projecting  the  system  of 
equations  onto  the  subspacc  of  solutions  which  satisfies  the 
constraints.  General  linear  constraints  arc  written  as 

Ax  =  b  9 

where  the  vector  x  represents  the  degrees  of  freedom,  each 
row  of  the  mxn  matrix  A  (m  <  n)  represents  a  linear 
constraint  on  x,  and  the  vector  b  represents  the  values  of 
these  constraints.  Given  a  particular  solution  xo.  the  space 
of  satisfying  vectors  for  the  system  can  be  expressed  as 

X  a  Zy  +  xo  10 

where  the  columns  of  the  nxz  matrix  Z  span  the  null-space 
of  A ,  and  the  vector  y  represents  a  reduced  set  of  2 
unconstrained  degrees  of  freedom.  Substituting  equation  10 
into  9  shows  that  Z  has  the  pioperty  that  AZy  =  0  for  all 
y.  The  number  of  columns  in  Z  is  n  minus  the  number  of 
independent  rows  in  A  because  each  independent  constraint 
in  A  removes  one  degree  of  freedom  from  the  system. 

Given  Z  and  xo,  the  minimization  problem  of  equation  8 
can  be  projected  onto  this  reduced  space  as 

Z^KZ  y  a  Z^F .  Z^xo  11 

Equation  10  regenerates  a  properly  constrained  solution  x  to 
the  original  minimization  problem  for  cacii  unconsuained 
solution  y  of  the  projected  minimization  problem. 


Ax  =  0,  reducing  the  first  n-z  rows  of  A  to  the  identity. 
This  produces  the  specially  factored  matrix  A' 


that  explicitly  separates  the  x  degrees  of  freedom  into 
dependent  and  independent  sets.  The  identity  submatrix  is 
associated  with  the  n-z  dependent  degrees  of  freedom  xj 
which  arc  "removed"  from  equation  1 1  by  constraints  and 
the  R  submatrix  is  associated  with  the  remaining 
independent  degrees  of  freedom  in  y.  Each  dependent 
constraint  in  A  produces  a  zero  row  at  the  bottom  of  the  A' 
matrix.  The  null-space  basis  Z  is  found  by  observing  tliai 
A'x  =  0  is  true  whenever  xj  =  -Ry  so  that 

0  =  A‘x  =  A‘  =A'  y  =  A‘Zy  =  0  and  Z=  1^ 

where  I  is  the  zxz  identity  matrix.  Note  that  full  pivoting 
is  absolutely  essential  during  this  procedure  if  a  well- 
conditioned  basis  is  to  result  (orthogonal  factorizuitions  such 
as  the  SVD  or  QR  arc  in  general  better  conditioned,  though 
more  computationally  expensive). 

It  is  important  to  note  that  this  technique  successfully 
generates  Z  in  the  face  of  redundant  constraints  in  A. 
Redundant  constraints  can  be  cither  compatible  as  in  the 
case  of  multiple  hinges  supporting  a  single  door  or 
conflicting.  In  our  system  wc  identify  conflicting 
constraints  when  solving  for  xq.  Wc  treat  all  dependent 
constraints  as  compatible  when  solving  for  Z  since  Z  is 
not  affected  by  the  particular  values  of  the  constraints.  In 
this  system  a  new  xo  is  computed  each  time  the  user 
changes  a  consuaint  value,  while  a  new  Z  is  only  computed 
each  time  a  consuaint  is  added  or  deleted. 

4  GEOMETRIC  CONSTRAINTS 


Wc  distinguish  between  two  kinds  of  gcomeuie  constraints, 
frozen  and  tracked.  A  frozen  constraint  is  added  to  the 
system  at  a  particular  time  by  freezing  some  geometric 
property  of  the  surface  while  allowing  tlte  rest  of  tlie  surface 
to  vary.  A  uacked  consuaint  varies  the  value  of  the 
consu^aint  over  time  also  causing  the  surface  to  deform. 
Our  current  strategy  for  exploiting  constraints  is  to  first 
freeze  in  constraints  and  then  to  track  them.  A  frozen 
consuaint  has  the  advantage  that  at  least  the  current  surface 
configuration  is  guaranteed  to  satisfy  the  consuaint. 


There  are  any  number  of  stable  ways  of  calculating  Z  (sec 
Gill  ct  al.).  In  general,  selecting  a  subset  of  A's  columns 
on  which  to  base  Z  is  a  delicate  procedure,  especially  in  the 
presence  of  nearly-depcndeni  constraint  rows  (Golub  and 
Van  Loan,  Mauix  Computations,  p  571). 

A  very  simple  procetiurc  for  computing  Z  is  to  apply 
Gaussian  elirninatiofli  with  full  pivoting  to  the  system 


4.a  Point  Constraints 

The  simplest  constraint  to  visualize  is  a  freezing  point 
consuaint.  A  particular  surface  point,  identified  by  a 
parametric  location  (u*^,  v^),  is  fixed  to  its  current  position 
for  all  future  times  t+At.  The  constraint  equation  is 
generated  from  the  B-spline  surface  equation  and  die  current 
values  of  the  control  points  Pij(t)  as 
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w(uV‘»)=£  I;  Pij(t)Ni.p(u°)Nj,p(v‘>) 

i»0 j»0 


14 


Each  constrained  point  generates  one  additional  constraint 
equation  that  is  added  to  the  constraint  matrix  A. 

4.b  Curve  Constraints 

The  curve  constraint  is  considerably  more  complicated  than 
the  point  constraint.  The  constraint  allows  any  curve  lying 
within  the  surface  to  be  frozen  at  time  t  such  that  the  rest  of 
the  surface  can  be  sculpted  in  future  times  without  violating 
the  frozen  shape.  The  constraint  equations  for  the  curve  are 
generated  by  considering  a  positive  definite  error  functional 
over  the  length  of  the  curve  as 


e 


1/2  (c(s)-c®(s)f  ds 


/curvt 


15 


where  c(s)  s  3d  shape  of  the  curve  in  the  surface  given  by 
c(s)  =  w(l(s))  =  i  i  Pij  Nip(u(s))  Nj^(v(s)) 

i.O j-O 


=  I  PijNij(l(s)) 

i>0J«0 

where  t(s)  =  (u(s)  v(s)],  a  curve  lying  in  the  surface  and 
c®(s)  =  the  target  3d  curve  shape  at  time  t  =  0. 

The  value  of  the  error  functional  for  the  curve  constraint  is 
both  zero  and  a  minimum  when  the  curve  c(s)  is  exactly 
equal  to  the  curve  c^(s}.  We  can  formulate  this  as  a  linear 
constraint  by  requiring  that  the  error  functional  always  be  at 
a  minimum  -•  that  its  gradient  with  respect  to  the  degrees  of 
freedom  be  0.  The  consuaint  that  each  term  of  the  gradient 
be  0  yields  one  linear  constraint  equation  for  each  degree  of 
freedom  in  the  system. 

Finding  the  minimum  error  value  e  will  automatically 
satisfy  frozen  constraints.  However,  this  will  not  be 
generally  true  for  tracking  consuaints  where  c®(s)  is 
allowed  to  change  over  time.  In  such  situations  the  system 
will  find  the  solution  which  best  satisfies  all  the  constraints 
e.g.  finds  the  most  minimum  value  available  for  e  given 
the  shape  representation  but  will  not  guarantee  satisfying 
the  constraints  exactly  e.g.  the  value  of  e  might  not  equal 
zero.  In  this  work  we  limit  ourselves  to  frozen  consuaints. 


The  linear  curve  constraint  equations  are 


For  the  B-spline  basis  functions 


=  Nij(t(s))  and  for  a  freezing  constraint. 


aPij 


aPij 


Once  integrated,  the  above  equations  yield  a  linear  set  of 
equations  in  Pjj,  Cx  =  v.  A  row  in  C  is  given  by 


Cki 


I  PijNij(t(s)) 
i«oj«a 
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Nki(t(s))  ds 


and  the  associated  term  in  V  is  given  by 


vki=  I  ZPijNij(t(s))|N,j(t(s))  ds 

'i-O 


19 


The  curve  constraint  generates  one  constraint  equation  for 
each  control  point  in  the  surface.  Typically,  most  of  these 
constraints  are  redundant  or  zero  equations  leaving  the 
surface  several  degrees  of  freedom  in  which  to  continue 
moving.  In  our  system  we  generate  all  nonzero  constraints 
and  depend  on  the  consU'uction  of  the  Z  matrix  to  eliminate 
the  redundant  constraints. 

4.e  Surfacft  Normal  Constraints 


We  formulate  a  constraint  on  the  surface  normal  along  a 
curve  as  a  pair  of  constraints.  The  surface  normal  at  a  point 
in  the  surface  is  in  the  direction  of  the  cross  product  of  any 
two  independent  surface  tangent  vectors  at  that  point.  In 
particular,  surface  bmgents  in  the  direction  of  a  curve  and 
normal  to  a  curve  generate  the  surface  normal  as 

n  =  Iwi  X  w„|  20 

where  n  =  surface  normal  at  a  point  on  the  surface  and 
wi  =  surfxe  tangent  in  the  direction  of  the  curve  and 
Wn  =  surface  tangent  in  the  direction  normal  to  the 
curve  in  parameter  space. 

The  surface  tangents  wj  and  Wn  arc  related  to  die  parametric 
derivatives  and  \\\  along  die  length  of  the  curve  c(t(s)) 
by  the  linear  rotation 


W,  • 

Us 

Vs 

■  Wu  ■ 

.  Wn  . 

.  -Vs 

Us  . 

.  Wv  . 

where  Uj  and  Vj  arc  the  components  of  the  nonnalized  curve 
tangent  in  paramcU'ic  space  given  as  1$  =  (us,  Vj). 

The  functions  Wy  and  Wv  along  die  length  of  the  curv'c  are 
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c«(s)  =  w«(t(s))=|;  2  Pijl^^:^Nj^v(s)) 
i.oj.o  du 
and 

Cv(s)  =  Wv(t(s))=X  S  PijNi.p(u(s))^iM:^ 
i-oj-o  dv 

The  erro*  functionals  for  the  constraints  are  written  as 


22 


e»  = 


*  f 

(ci(s)  •  c|’(s)]fds  and  Cn  =  (cn(s)  -  c“(s)f* 

cnivt  /oirvt 
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Like  the  curve  constraint,  finding  the  minimum  of  the  error 
functionals  Ct  and  tn  with  respect  to  the  degrees  of  freedom 
Pij  yield  the  sets  of  constraint  equations  to  be  enforced. 
The  combination  of  constraining  the  curve's  tangent  shape 
Ct  and  the  curve's  normal  shape  Cn  acts  to  constrain  the 
surface  normal  along  the  length  of  the  curve.  Note  that  the 
constraint  on  curve  shape  c  can  replace  the  constraint  on  ct 
since  constraining  the  curve's  shape  automatically 
constrains  the  higher  order  surface  derivatives  along  the 
length  of  the  curve. 

5  RESULTS 


The  techniques  discussed  in  this  paper  were  implemented  in 
an  interactive  sculpting  design  package  that  runs  on  a 
Silicon  Graphics  workstation.  An  example  of  the  system's 
modeling  capability  is  shown  in  Figure  1.  The  surface  in 
figure  1  is  a  3rd  order  tensor-product  B-spline  with  an  8x8 
array  of  control  points.  The  surface  is  constrained  to 
interpolate  the  closed  curve  shown  as  a  heavy  dark  line. 
The  constraint  eliminates  24  of  the  original  64  system 
degrees  of  freedom.  Pressure  sculpting  loads  are  applied  to 
the  surface  inside  the  closed  constraint.  The  sequence  of 
images  in  Figure  1  are  produced  by  varying  the  magnitude 
of  the  pressure  force  interactively  with  a  slider  bar.  The 
curve  constraint  is  enforced  exactly  at  all  times  while  the 
surface  is  sculpted. 

6  CONCLUSIONS  AND  FUTURE  WORK 


An  interactive  modeling  system  designed  to  sculpt  free-form 
surfaces  in  the  presence  of  point  and  curve  constraints  based 
on  the  techniques  described  in  this  paper  is  implemented  on 
a  Silicon  Graphics  Workstation.  The  system  supports 
interactive  sculpting  under  any  combination  of  frozen 
constraints.  Based  on  this  experience  we  make  the 
following  conclusions. 


lying  in  the  surface,  as  well  as  its  higher  order  derivatives. 
Exploiting  the  first  order  surface  derivative  constraint  we 
were  able  to  build  a  surface  normal  consuainL 


Figure  1.  A  closed  curve  consuaini  applied  to  a  surface 


We  have  formulated  a  strategy  for  enforcing  linear 
constraints  on  linearly  blended  surfaces  in  interactive  time. 
Using  the  B-spline  basis  functions  as  a  shape  representation 
we  have  shown  how  this  strategy  can  be  used  to  enforce  a 
rich  range  of  geomeuic  consuain^.  We  have  shown  how 
to  constrain  a  point  in  the  surface,  and  the  shape  of  a  curve 


An  important  limitation  to  the  technique  presented  is  that 
the  shape  of  the  geomeuic  consuaint  in  the  surface’s  uv- 
plane  must  remain  fixed  over  time.  Otherwise,  the 
nonlinearity  of  the  surface  basis  functions  produce  nonlinear 
consuaint  equations.  Although  techniques  are  a\  ailable  for 
solving  nonlinear  constraint  problems  they  tend  to  be 
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inappropriate  for  interactive  systems  since  they  depend  on 
iterative  refactorizations  of  the  basis.  What  is  needed  in 
future  work  is  a  good  solution  for  selecting  suitable 
parameterizations  for  constraints.  Such  a  solution  would 
enable  a  very  exciting  system  for  modeling  with  generalized 
curve  skinning. 

Another  limitation  of  the  system  described  here  involves 
the  discretization  error  of  the  surface  approximation.  The 
curve  constfaint  and  energy  minimization  techniques  used 
here  find  the  minimum  solution  for  a  given  surface 
representation,  but  such  a  solution  may  or  may  not  be  an 
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ABSTRACT 

In  this  paper,  we  present  techniques  for  integrating  constraint 
and  direct  manipulation  approaches  to  geometric  modeling. 
Direct  manipulation  positioning  techniques  are  augmented 
to  provide  the  option  of  making  the  relationships  they  estab¬ 
lish  persistent.  Differential  constraint  techniques  are  used  to 
maintain  these  relationships  during  subsequent  editing.  Is¬ 
sues  in  displaying  and  editing  constraints  are  also  addressed. 
By  integrating  constraints  with  direct  manipulation,  it  is  pos¬ 
sible  to  build  systems  that  provide  the  power  of  explicit  repre¬ 
sentation  of  geometric  relationships  and  the  properties  which 
make  direct  manipulation  so  attractive. 

INTRODUCTION 

Geometric  relationships  between  parts  arc  an  important  ele¬ 
ment  in  geometric  models.  From  ^c  earliest  days  of  interac¬ 
tive  systems!  1 3] ,  the  benefits  of  using  constraints  to  explicitly 
represent  these  relationships  have  been  known.  Although 
many  have  discussed  the  value  of  constraints,  constraint- 
based  approaches  have  not  been  successful  in  practical  sys¬ 
tems.  Their  success  has  been  hindered  by  a  large  number  of 
difficult  issues. 

In  contrast  to  the  failure  of  consu^nts,  direct  manipulation 
systems  have  been  successful  for  geomeuie  modeling  tasks. 
Users  conuol  the  geomeuy  of  objects  by  interactively  grab¬ 
bing  and  pulling  them,  with  continuous  update  providing 
feedback.  Such  systems  employ  snapping  techniques,  such 
as  grids,  to  aid  in  establishing  relationships,  but  these  rela¬ 
tionships  arc  immediately  forgotten.  They  arc  neither  explic¬ 
itly  represented  nor  automatically  preserved.  It  is  the  user’s 
job  to  maintain  them  during  subsequent  editing. 

In  this  paper  we  combine  the  two  approaches:  snapping 
techniques  establish  relationships  and  constraint  techniques 
maintain  them  during  subsequent  dragging.  Our  integrated 
approach  distinguishes  the  problem  of  establishing  relation¬ 
ships  from  that  of  maintaining  them  during  subsequent  edit- 
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ing.  This  separation  allows  us  to  skirt  several  difficult  issues 
in  constraint-based  systems.  Integration  with  direct  manip¬ 
ulation  addresses  issues  in  solving,  specifying,  debugging, 
displaying  and  editing  constraints. 

The  Briar  drawing  program  demonstrates  our  approach.  When 
direct  manipulation  snapping  establishes  a  new  relationship, 
augmented  snapping  provides  the  user  with  the  option  of 
transforming  it  into  a  persistent  constraint.  Differential  con¬ 
straint  techniques  can  maintain  these  during  dragging.  Direct 
manipulation  techniques  also  address  editing  constraints. 

ESTABLISHING  RELATIONSHIPS  IN  DRAWINGS 

Previous  constraint-based  systems  have  operated  in  what  we 
call  a  "specify-then-solve”  approach  to  constraint  usage.  In 
such  systems,  the  user  describes  the  model  by  declaring  rela¬ 
tionships  which  must  hold  true  and  the  system  configures  the 
model  to  meet  these  requirements.  This  approach  allows  a 
user  to  specify  the  important  aspects  of  a  design  and  have  the 
system  resolve  the  details.  Because  the  system  explicitly  rep¬ 
resents  the  relationships,  it  can  insure  that  these  constraints 
continue  to  hold  during  subsequent  editing. 

There  are  problems  in  using  the  specify-then-solve  approach. 
One  is  “solving”  the  constraints  -  finding  a  new  configuration 
of  the  geometric  model  which  meets  the  set  of  requirements. 
This  is  difficult  because,  in  general,  systems  of  non-linear 
algebraic  equations  must  be  solved  from  arbitrary  starting 
points.  While  this  problem  is  intractable[ll],  systems  can 
usually  operate  by  limiting  the  class  of  constraints  which 
can  be  handled  (as  done  by  [6, 14])  or  using  temperamental 
numerical  techniques  (as  done  by  [9, 12]).  If  no  configura¬ 
tion  is  found  that  satisfies  the  constraints,  it  can  be  difficult 
to  determine  whether  none  exists  or  if  the  solver  was  just 
unable  to  find  one.  If  no  solution  exists,  the  conflicts  must 
be  diagnosed  and  debugged.  If  the  solver  does  find  a  new 
configuration,  it  must  help  the  user  understand  how  and  why 
it  jumped  to  the  new  state. 

These  three  challenges,  solving  constraint-satisfaction  prob¬ 
lems  from  arbitrary  starting  points,  presenting  state  jumps  to 
users,  and  coping  with  conflicts,  must  be  addressed  to  build  a 
specify-then-solve  system.  However,  tliese  issues  only  arise 
when  the  constraint  mechanism  is  used  to  reconfigure  the 
model  to  establish  new  relationships.  To  skirt  these  difficult 
issues,  we  separate  the  task  of  maintaining  existing  relation¬ 
ships  from  that  of  initially  satisfying  them. 

Our  systems  use  direct  manipulation  to  establish  relation- 
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ships  and  use  constraint  techniques  for  maintaining  them 
during  subsequent  editing.  Constraints  are  only  generated 
for  relationships  which  exist  in  the  drawing.  They  start  out 
satisfied  so  there  is  never  a  need  to  jump  from  an  arbitrary 
state  to  a  consistent  one.  There  are  no  constraint-satisfaction 
problems  to  solve  or  state  jumps  to  explain.  There  is  no  con¬ 
cern  about  conflicting  or  unsatisfiable  constraints,  since  there 
exists  at  least  one  configuration  which  meets  the  constraints. 

MAINTAINING  REUTIONSHIPS 

VTien  initial  solutions  are  provided,  the  task  of  constraint 
techniques  changes;  instead  of  establishing  the  relationships, 
constraint-based  techniques  are  used  to  maintain  them.  Rather 
than  jumping  from  an  inconsistent  state  to  one  where  the  con¬ 
straints  are  met,  constraint  techniques  permit  users  to  drag 
models  and  have  the  constraints  enforced  as  the  drawing  fol¬ 
lows  with  continuous  motion.  We  call  this  facility  to  drag 
constrained  models  Differential  Constraints. 

Unlike  solving  non-linear  algebraic  equations,  good  tech¬ 
niques  exist  for  maintaining  constraints  during  dragging.  We 
use  techniques  which  treat  the  motion  of  the  model  as  a  dif¬ 
ferential  equation  and  provide  methods  for  maintaining  sets 
of  non-linear  constraints  by  solving  systems  of  sparse  lin¬ 
ear  equations[4] .  Alternatively,  solving  can  be  accomplished 
using  a  standard  constraint-solving  approach;  the  model  is 
repeatedly  perturbed  slightly,  then  re-solved. 

Fast  computers  and  good  algorithms  allow  update  rates  which 
give  the  appearance  of  continuous  motion.  This  rapid  feed¬ 
back  is  essential.  Although  the  trajectory  the  model  follows 
is  not  part  of  the  resulting  drawing,  this  animation  makes  it 
possible  for  users  to  employ  their  perceptual  skills  to  connect 
states  of  the  drawing  with  many  tilings  changing  between 
them[2]. 

Differential  constraints  provide  a  natural  way  to  incorporate 
constraints  into  a  conventional  drawing  system.  Objects  are 
dragged  the  same  way,  except  that  relationships  can  be  main¬ 
tained  among  them.  This  tdlows  the  drawing  process  to  be 
incremental:  each  new  relationship  added  to  a  drawing  docs 
not  disturb  previously  established  ones. 

The  ability  to  directly  manipulate  consuained  models  helps 
address  many  of  the  issues  in  constraint-based  systems.  It 
provides  an  easy  way  for  the  user  to  explore  underconstrained 
spaces,  permitting  them  to  experiment  with  models  to  under¬ 
stand  how  they  work,  or  why  they  do  not.  The  existence 
of  a  direct  manipulation  facility  means  that  all  parts  of  the 
model  do  not  need  to  be  specified  by  constraints.  If  it  is 
difficult  to  devise  a  way  to  describe  an  aspect  of  a  drawing 
with  consuaints,  direct  manipulation  can  be  used  instead. 

Constraints  can  aid  in  the  direct  manipulation  process  by 
providing  the  user  with  “exba  hands”  to  hold  things  in  place. 
Providing  the  user  with  “lightweight  consUaints”  which  arc 
easy  to  place  temporarily  to  aid  in  manipulation  is  a  useful 
feature  in  modeling  systems. 


SPECIFYING  CONSTRAINED  MODELS 

Rather  than  using  consUBints,  our  approach,  like  most  direct 
manipulation  systems,  uses  gravity  to  help  users  establish 
relationships  in  models.  The  drawing  cursor  follows  the  mo¬ 
tion  of  the  pointing  device,  but  snaps  to  locations  which  will 
establish  relationships  in  the  model  when  it  is  close  to  them. 
Ibis  idea  of  gravity  has  existed  for  a  long  time,  having  been 
demonstrated  as  early  as  Sketchpad[13].  The  most  common 
valiant  of  gravity  is  the  uniform  grid.  A  more  interesting 
technique  is  Snap-Dragging[l]  which  extends  gravity  by  ex¬ 
panding  the  set  of  snapping  targets  to  include  intersections 
and  construction  lines. 

Gravity  is  successful  at  helping  a  user  establish  relationships 
in  models,  but  previous  systems  promptly  forget  these  re¬ 
lationships  once  the  positioning  operation  is  complete,  lb 
employ  consumnt  maintenance,  these  newly  established  rela¬ 
tionships  must  be  made  into  persistent  constraints.  The  user 
could  be  required  to  explicitly  identify  the  constraints,  but 
this  creates  excess  work;  each  relationship  is  specified  twice, 
once  to  establish  it  and  once  to  identify  it  as  a  consuaint. 
Previous  systems  have  attempted  to  infer  constraints  after 
drawing  operations  by  looking  at  the  resulting  drawing[10], 
oratatraccof  useractions[7].  Because  this  information  typi¬ 
cally  docs  not  specify  the  relationships  unambiguously,  these 
systems  relied  on  heuristics  or  asked  the  user  to  resolve  the 
ambiguity[8].  Our  approach  augments  positioning  methods 
so  they,  in  addition  to  location,  unambiguously  specify  the 
relationships  which  are  being  established. 

Our  augmented  snapping  technique  lets  direct  manipulation 
positioning  specify  consuaints  as  well  as  location.  It  en¬ 
hances  the  snapping  operation  so  that  it  generates  constraints. 
The  basic  idea  is  that  cursor  placement  operations  contain 
information  about  why  an  object  was  positioned  where  it 
was,  and  can,  therefore,  also  provide  a  consuaint  specifica¬ 
tion.  Suppose  the  user,  while  dragging  an  object,  moves  the 
pointer  near  another  object  so  that  the  cursor  and  the  point  be¬ 
ing  dragged  snap  to  the  second  object.  Snapping  has  helped 
the  user  establish  a  relationship  between  the  dragged  point 
and  the  target  object.  We  provide  the  user  with  the  option 
of  making  this  relationship  persistent  so  it  can  be  preserved 
during  subsequent  editing. 

When  a  snapping  operation  occurs,  the  system  acknowledges 
it  by  showing  the  newly  established  relationship  to  the  user. 
The  user  has  the  opportunity  to  accept  the  new  relationship, 
transforming  it  into  a  persistent  constraint.  To  make  the 
constraint  creation  process  more  transparent,  the  default  can 
be  to  accept  new  consuaints. ' 

Augmented  snapping  permits  direct  manipulation  techniques 
to  be  integrated  with  constraints.  Since  snapping  is  used  for 
all  drawing  operations,  such  as  creating  and  moving  objects, 
all  of  these  operations  can  specify  constraints.  Constraint 
generation  is  opportunistic,  as  the  user  draws,  constraints  are 

'Although  wc  provide  an  “atcideni  prone"  mode  where  accepunte  is 
not  the  default,  we  hud  that  it  is  seldom  used. 
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created  when  relationships  are  established.  Aside  from  the 
occasional  rejection  (or  acceptance,  if  that  is  not  the  default) 
of  a  constraint,  the  interface  should  not  require  any  additional 
effort  from  the  user.  Such  an  interface  feels  just  as  fast  and 
clean  as  the  non-augmented  version. 

As  a  constraint  specification  technique,  augmented  snapping 
o^ers  other  advantages.  It  does  not  require  the  user  to  learn 
new  commands  for  each  type  of  constraint  since  a  uniform 
interface  creates  all  constraints.  Because  it  provides  both 
constraints  and  an  initial  configuration  that  satisfies  them, 
augmented  snapping  cannot  create  conflicting  constraints. 

Careful  attention  to  the  user  interface  is  crucial  to  making 
augmented  snapping  work.  Feedback  must  show  the  snap¬ 
ping  operations  to  the  user  so  it  is  clear  what  relationship  is 
being  established.  When  a  new  relationship  is  established,  it 
must  be  displayed  prominently  enough  that  it  is  clear  what 
constraint  will  be  created,  but  not  be  obtrusive  to  hinder  the 
drawing  process. 

Augmented  snapping  only  generates  constraints  for  relation¬ 
ships  which  arc  unambiguously  specified  by  the  user’s  ac¬ 
tions.  A  snapping  operation  unambiguously  specifics  a  rela¬ 
tionship,  but  if  muUiple  objects  coincide,  it  can  be  ambigu¬ 
ous  which  to  snap  to.  Feedback,  which  clearly  shows  which 
object  is  snapped  to,  and  a  cycling  mechanism  to  choose 
between  potential  snapping  targets  resolves  this  problem. 
Pruning  the  set  of  objects  that  arc  snapped  to  (for  example, 
avoiding  snaps  which  would  create  a  redundant  constraint) 
avoids  excess  cycling. 

Augmented  snapping  does  not  guess  about  the  user’s  inten¬ 
tions.  It  relics  on  the  construction  process  to  obtain  con¬ 
straints.  The  user  may  construct  a  model  in  a  manner  which 
docs  not  convey  the  desired  consuaints.  To  curtail  this,  it  is 
important  to  design  modeling  operations  which  make  it  easy 
to  convey  what  is  intended,  rather  than  just  what  is  conve¬ 
nient  to  express.  For  example,  making  two  objects  be  the 
same  size  should  be  no  more  work  than  making  them  both  be 
the  same  fixed  size. 

VISUAL  REPRESENTATIONS  FOR  CONSTRAINTS 

Constrained  drawings  have  more  state  that  must  be  displayed 
to  the  user  than  non-consuaincdoncs  do.  A  system  must  con¬ 
vey  to  the  user  not  only  the  geometry  of  the  model,  but  also 
the  constraints.  The  user  must  be  able  to  edit  this  structural 
information  as  well  as  the  geometry.  Although  textual  lan¬ 
guages  for  describing  consu’aints,  such  as  in  (9, 14]  are  easy 
to  edit,  they  are  distinct  from  the  drawing  and  can  be  difficult 
to  connect  to  their  corresponding  places  in  the  model.  Visual 
representations(5.  12]  superimpose  symbols  for  consu-aints 
directly  on  the  model.  Unfortunately,  devising  clear  visual 
representations  is  challenging  and  editing  such  representa¬ 
tions  is  often  difficult. 

When  differential  constraints  are  used,  the  continuous  motion 
and  ability  for  users  to  experiment  with  models  can  convey 
much  of  the  information  about  the  constraints.  We  also  use  a 


visual  representation  for  constraints. 

The  problem  of  editing  constraints  transcends  visual  repre¬ 
sentations.  Before  being  able  to  delete  or  modify  a  constraint, 
the  user  must  figure  out  which  constraints  to  alter.  We  have 
developed  methods  for  editing  constraints  which  avoid  this 
problem  by  having  the  users  edit  constraints  by  referring  to 
the  desired  effects,  not  to  the  constraints  themselves.  Instead 
of  pointing  at  constraints,  users  directly  manipulate  objects 
to  show  how  they  are  to  move.  For  example,  constraint 
maintenance  can  be  disabled  so  objects  move  freely.  Con¬ 
straints  which  are  broken  arc  clearly  noted  to  the  user.  When 
maintenance  is  restarted,  violated  constraints  are  removed. 
A  variant  is  a  “rip”  command  which  allows  the  user  to  pull 
part  of  an  object  free  from  its  constraints. 

Designing  the  semantics  of  the  constraints  properly  can  also 
reduce  problems  in  the  visual  language.  For  example,  when 
a  group  of  points  is  connected  together,  an  equivalence  class 
is  used  rather  than  a  large  number  of  binary  connection  re¬ 
lations.  This  is  also  significant  since  it  removes  the  need  to 
remember  which  point  is  connected  to  which  other  point  if 
some  arc  to  be  disconnected. 

DRAWING  WITH  CONSTRAINTS 

To  explore  the  integration  of  constraints  and  direct  manipula¬ 
tion,  we  have  built  a  drawing  prograiTi  called  BriaP'  [3].  A  di¬ 
rect  manipulation  drawing  tcchniquccallcd  Snap-Dragging[)] 
is  augmented  to  specify  constraints.  Differential  constraint 
techniques  arc  used  to  maintain  these  relationships  as  the 
user  modifies  the  drawing.  Augmented  Snap-Dragging  also 
serves  as  the  basis  for  a  visual  representation  for  the  con¬ 
straints. 

Snap-Dragging  enhances  the  usefulness  of  gravity.  The  cur¬ 
sor  snaps  not  only  to  the  edges  of  objects,  but  also  to  inter¬ 
esting  points  in  the  scene  such  as  intersections  and  vertices 
of  objects.  Relations  other  than  contact  arc  created  in  Snap- 
Dragging  through  alignment  objects;  objects  that  are  not  part 
of  the  drawing  per  se,  but  exist  only  to  be  snapped  to.  The 
original  Snap-Dragging  work  includes  several  types  of  align¬ 
ment  objects,  each  corresponding  to  types  of  relationships 
which  arc  useful  in  drawings.  The  usefulness  of  alignment 
objects  is  further  enhanced  by  making  them  easy  to  place. 

Snap-Dragging  provides  twooperationsforpositioningpoints 
in  two  dimensions:  snapping  the  cursor  to  a  point,  such 
as  a  vertex,  and  snapping  the  cursor  to  an  object’s  edge  or 
curve.  These  operations  correspond  directly  to  Briar's  two 
basic  consuaints,  “points-coincident”  and  "point-on-object” 
respectively.  The  two  snapping  operations  combined  with 
alignment  objects  allow  a  user  to  establish  a  wide  variety  of 
relationships.  Similarly,  the  two  basic  constraints  arc  com¬ 
bined  with  alignment  objats  to  enforce  a  similarly  large  set 
of  relationships.  For  example,  a  distance  constraint  can  be 
expressed  using  a  fixed  size  circle. 

IS  called  linar  because,  like  the  plant  it  is  named  for,  things  suck 
together  inside  it 
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Augmented  Snap-Dragging  also  provides  the  basis  for  Briar’s 
visual  representation  of  constraints.  Constraints  are  dis¬ 
played  just  as  they  are  specified:  using  the  two  basic  elements 
along  with  alignment  objects.  Although  Briar  can  handle  a 
wide  variety  of  relationships,  users  need  not  learn  a  large 
number  of  constraint  creation  commands  or  display  symbols. 
Briar  provides  several  methods  for  altering  constraints  by 
direct  manipulation  of  objects,  including  disabling  constraint 
maintenance  and  commands  to  “rip”  parts  of  objects  free  of 
their  constraints. 

Briar's  display  employs  many  mechanisms  to  convey  its 
state  to  the  user.  Objects  light  up  when  snapped  to  and 
the  cursor  changes  shape  to  indicate  the  type  of  snapping 
operation.  Newly  established  relationships  are  shown  in  dis¬ 
tinctive  colors  which  signify  whether  or  not  they  wilt  become 
constraints. 

THREE  DIMENSIONAL  SYSTEMS 

Extending  a  system  like  Briar  to  three  dimensional  modeling 
poses  a  new  set  of  challenges.  For  modeling  tasks,  the  set  of 
possible  spatial  relationships  between  objects  is  much  richer, 
and  more  complex,  than  in  2D.  However,  this  richness  and 
complexity  is  also  a  strong  motivation  for  the  development 
of  constrained  interaction  techniques  for  3D.  Direct  manip¬ 
ulation  techniques  to  establish  spatial  relationships  are  not 
os  developed  as  their  two  dimensional  counterparts.  A  more 
pragmatic  concern  is  that  our  reliance  on  feedback  already 
causes  Briar  to  use  almost  all  available  perceptual  cues,  such 
as  texture,  hue,  brightness,  size,  and  motion,  leaving  little  for 
the  increased  visual  demands  of  3D. 

Ibchniques  such  as  augmented  Snap-Dragging,  differential 
constraints,  and  visual  alteration  of  constraints  make  it  pos¬ 
sible  to  build  systems  which  integrate  constraints  and  direct 
manipulation.  Such  systems  can  combine  the  power  of  repre¬ 
senting  geometric  relationships  with  the  fluency  and  intuitive 
interfaces  which  have  made  direct  manipulation  so  success¬ 
ful. 
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Abstract 

Interactive  modeling  systems  that  continually  maintain 
a  physically-realistic  representation  of  an  object 
combine  advantages  of  interactive  graphics  and  batch 
simulations.  In  this  paper  I  address  two  advantages  of 
incorporating  physics  into  Sculpt,  an  interactive  protein 
modeling  system.  First,  time-consuming  model 
correction  is  avoided  by  maintaining  a  physically-valid 
model  throughout  a  modeling  session.  Second, 
additional  cues  about  model  properties  can  arise  when  a 
chemist  interactively  guides  a  simulation  rather  than 
views  a  cine  loop  from  a  pre-computed  simulation.  I 
argue  these  benefits  with  examples  from  sessions  with 
Sculpt.  A  chemist  can  interactively  move  atoms  while 
Sculpt  automatically  maintains  proper  bond  topology 
and  atom  separations.  Sculpt  models  bonded  and  non- 
bonded  atom  interactions  for  medium-size  proteins  (800 
atoms)  at  0.6  updates  per  second  on  a  Silicon  Graphics 
240  using  a  constrained  energy  minimization  method. 

CR  Calegories  and  Subject  Descriptors:  1.3.5 
(Computer  Graphlcsl:  Computational  Geometry 
and  Object  Modeling;  1.3.6  (Computer  Graphics): 
Methodology  and  Techniques;  I.J.2  (Computer 
Applications):  Physical  Sciences. 

Additional  Keywords  and  Phrases;  Physically- 
based  modeling,  interactive  modeling,  constraint 
systems,  scientific  visualization. 

1.  Introduction 

Within  the  last  ten  years  a  trend  in  computer  graphics  has  been 
to  increase  scene  realism  by  using  physically-based  models. 
Animators  use  physically-based  modeling  to  create  realistic 
detailed  behavior.  Most  animations  generated  with  physically- 
based  modeling,  to  date,  required  minutes  to  hours  of 
computation  for  each  frame.  This  large  computation  time  has 
kept  physically -based  modeling  out  of  interactive  graphics 
systems  except  with  small,  simple  models.  However,  increased 
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computer  speeds  now  permit  adding  physically-based  models  to 
interactive  systems.  I  have  picked  a  large  modeling  problem 
with  simple  properties  to  study  issues  that  arise  in  modeling 
physical  properties  in  an  interactive  graphics  system. 

Protein  modeling  systems  represent  molecules  containing  one 
hundred  to  several  thousand  atoms.  The  systems  can  be 
classified  as  interactive  or  batch  (though  some  interactive 
systems  have  batch  processing).  Most  interactive  systems 
maintain  bonded  properties  such  as  fixed  bund  lengths  and 
angles  by  restricting  operations  to  rotation  of  segments  about 
particular  bonds.  The  performance  of  interactive  systems  is 
only  limited  by  the  display  capability  of  the  graphics  system 
since  the  modeling  operations  are  only  rotations.  Batch 
simulations  mode)  variance  in  bond  lengths  and  angles  anil 
interactions  among  non-bonded  atoms  over  relatively  near  and 
far  distances.  Accurately  modeling  all  these  properties  requires 
batch  computation,  even  for  small  proteins. 

Today  an  interactive,  physically-based  modeling  system, 
called  Sculpt,  models  non-bonded  atom  interactions  for 
medium-size  proteins  (800  atoms)  on  a  Silicon  Graphics  240  at 
0.6  updates  per  second.  Sculpt  lets  a  chemist  interactively 
move  atoms  while  automatically  keeping  correct  bonded 
properties  and  non-bonded  atom  separations  using  a 
constrained  energy  minimizer.  Compared  to  many  other 
physically-based  modeling  systems  in  computer  graphics. 
Sculpt  models  simpler  properties  (c.g.  angles  versus  volumes) 
and  minimizes  static  strain  eneigies  rather  than  functions  of 
object  dynamics.  However,  sys.em  performance  now  allows 
investigation  into  issues  that  a.i^e  when  physically-based 
modeling  is  applied  to  complex  real  applications. 

Chemists  that  collaborate  on  the  project  believe  interactive, 
physically -based  modeling  will  relieve  many  manual  modeling 
tasks,  allowing  more  work  in  less  time,  and  provide  additional 
cues  about  pro'ein  behavior.  In  this  paper  1  present  two 
improvements  the  system  provides  that  result  from  modeling 
physical  properties  interactively  First,  the  system  removes 
the  often  laborious  task  of  fixing  a  physically-invalid  model 
after  a  modeling  session.  Though  interactive  systems  such  as 
-Sybyl  [151  maintain  fixed  bond  lengths  and  angles,  they  make 
the  chemist  keep  non-bonded  atoms  at  appropriate  separations. 
Second,  the  system  provides  a  new  medium  lor  exploring 
piotein  properties  by  allowing  interactive,  guided  simulation. 
This  should  combine  benefits  ot  interactive  graphics  and  Hitch 
siinulaiions 
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2.  Related  work 

Physically-based  modeling  frequently  aids  computer 
animations  by  automating  detailed  motion  planning  and 
complex  object  interactions.  Miller  generates  realistic  snake 
motions  by  modeling  muscle  contractions  with  springs  and 
friction  against  surfaces  (91.  Witkin  models  the  energy  and 
momentum  of  a  Luxo  lamp  Jumping  hurdles  and  ski  jumps  (18). 
Terzopoulos  models  energy  in  elastically  deformable  objects 
such  as  cloth  to  create  animations  of  flags  (H).  These 
examples  simulate  the  motion  of  objects  by  first  stating 
application-specific  conditions  about  the  objects  and  scene  and 
then  solving  Newton's  equations  of  motion. 

Similar  applications  use  constraints  to  restrict  the  allowable 
states  of  objects  and  express  dependencies  among  objects. 
Barzel  u»es  constraints  in  animation  to  specify  paths  for 
objects  (41.  Witkin  uses  geometric  constraints  to  assemble 
models  (161,  and  he  describes  a  system  that  lets  a  user 
interactively  connect  and  manipulate  objects  such  as  a 
mechanical  assembly  or  tinker-toy  117|.  Constraints  maintain 
constant  volume  in  incompressible  solids  |12|  and  restrict 
penetration  when  a  ball  strikes  a  trampoline  |ll|. 

3.  Driving  problem  -  protein  modeling 

A  protein,  to  a  first  approximation,  contains  fixed  bond 
lengths,  fixed  bond  angles,  and  some  planar  segments. 
Figure  1-A  shows  three  sequential  segments  m  a  protein  with 
vectors  representing  bunds  between  atoms  and  gray  areas 
denoting  planar  regions.  The  only  degrees  of  freedom  m  the 
figure  are  rotations  about  the  N-C  and  C-C  bonds  that  enter  and 
leave  each  planar  segment.  A  linear  sequence  of  the  segments 
comprise  the  protein  backbone.  Attached  to  the  atom  between 
each  segment  (C)  are  tiJechains  (not  shown)  with  additional 
fixed  length  and  angle  properties.  Superimposed  onto  this 
geometric  model  are  non-bonded  attractions  and  repulsions. 
Attractions  hold  nearby  atoms  together,  while  repulsions 
maintain  a  minimal  separation  between  all  atom  pairs 

Chemists  often  use  brass  models  (Kendrew  models)  to  study 
geometric  properties  and  relationships  in  a  protein.  Brass 
models  contain  segments  shown  in  Figure  l  A  connected  with 
rotational  joints  about  the  N-C  and  C-C  bonds.  Manipulating 
such  a  model  with  one's  hands  aids  understanding  of 
relationships.  However,  the  models  have  two  drawbacks 
First,  the  model's  size  becomes  difficult  to  hold  and  maiupuiate 
when  dealing  with  large  molecules  (e  g.  an  KOO-atom  brass 
model  of  the  protein  in  Color  Plate  I  is  80 ceniimeiers  wide 
when  2  cm  of  brass  represents  1  Angstrom,  a  typical  bond) 
Second,  brass  models  do  not  represent  attraciive  and  repulsive 
interactions  among  non-bonded  atoms. 

Chemists  use  computers  to  model  large  proteins  and  non- 
bonded  atom  interactions.  Interactive  modeling  sysienis 
resemble  brass  models  by  allowing  only  roiaiioiis  am>u! 
particular  bonds.  The  limiting  tactor  in  interaciive  sysicms  is 
display  rate  of  the  graphics  maciiinc  Batch  simulaiions  model 
non-bonded  atom  interactions  and  more  uccuraiely  model  bond 
lengths  and  angles  (these  do  (ary.  ibougli  by  only  a  lew 
percent). 


Protein  modeling  provides  a  good  driving  problem  for  research 
in  interactive  physically-based  modeling.  First,  the  benefits 
of  interactive  graphics  and  batch  simulations  are  each  well 
established.  Second,  real  users  want  such  a  system  and  will 
provide  valuable  assistance  in  its  development.  Third,  the  size 
of  useful  models  requires  improved  algorithms  for  interactive 
modeling  on  current  machines.  Fourth,  many  aspects  of 
protein  modeling  are  similar  to  other  problems.  For  example, 
the  inherent  three-dimensional  structure  requires  addressing 
mechanical  modeling  issues  similar  to  tho.se  encountered  in 
articulated-figure  motion  and  computer-aided  design.  Fifth, 
understanding  the  interplay  of  properties  in  proteins  during  the 
modeling  requires  good  visualization  paradigms. 

4.  Sculpt's  interface  and  performance 

Sculpt  continually  maintains  realistic  protein  properties  as  a 
chemist  moves  an  atom.  Sculpt  lets  a  chemist  move  an  atom 
by  first  attaching  a  spring  between  the  atom  and  the  cursor  and 
then  dragging  the  cursor  in  a  desired  direction.  Throughout  the 
dragging  process.  Sculpt  polls  the  cursor  position  and  adds  the 
strain  energy  of  that  spring  to  the  energy  in  the  protein. 
Si  ulpt  then  finds  a  local  minimum  of  the  total  energy  that  also 
maintains  rigid  bond  lengths,  angles,  and  planar  segments. 
Sculpt  also  lets  a  chemist  insert  a  spring  that  continually  pulls 
an  atom  towards  a  given  three-dimensional  position. 

The  color  plates  show  photographs  of  Sculpt  sessions.  Depth- 
cued  vectors  represent  bonds  between  atoms;  cyan  denotes  the 
central  backbone,  and  tun  denotes  sidechuins  connected  to  the 
backbone.  Gold  coils  show  springs  attached  by  a  chemist  to 
pull  atoms  toward  positions  denoted  by  the  gold  thumbtacks. 
Color  Plate  1  shows  a  model  containing  760  atoms  of  a 
medium-sized  protein  called  Felix  (8).  The  model  contains 
2205  constraints  (bond  length,  angle,  and  others)  and 
approximately  8005  energy  functions  (attraction,  repulsion, 
and  others).  The  backbone  in  Color  Plate  1  winds  through  four 
helices  (purple  cylinders  highlight  the  two  on  the  left).  Color 
Plate  2  shows  a  model  composed  of  the  two  helices 
higliliglited  in  Color  Plate  I.  The  model  contains  .^55  atoms, 
1027  constraints,  and  approximately  .M50  energy  functions. 
The  text  in  Color  Plate  2  names  several  ot  the  sidecliams. 

Sculpt  maintains  approximately  0.7  updates  per  second  with 
the  model  in  Color  Plate  1  and  1.5  updates  jier  second  with  the 
model  111  Color  Plate  2,  on  a  Silicon  Gr,iphics  240-GT.X  |2|. 
•An  update  includes  the  following  steps:  evaluate  protein 
properties  (bond  lengths,  angles,  attractions,  and  repulsions) 
and  their  derivatives,  mmmiize  the  energy  and  satisty  the 
constraints,  update  atom  positions,  and  display  the  results. 
Though  tills  peitormaiice  is  twenty  tunes  too  slow  tor  sinootli 
interaction,  our  cliemist  collabor.uors  belies e  the  pertonnaiice 
already  provides  enough  interactivity  on  medium-size  proteins 
that  new.  uselul  research  can  be  accomplished  that  could  not 
previously  be  undertaken  The  system  is  described  in  greater 
detail  III  jl.^l 

5.  Maintaining  a  consistent  modei 

L  scrs  ol  iiileractive  iiiodcliiig  systems  (lutt  only  laolceular) 
otteii  unintentionally  move  objects  into  a  conliguiation  that 
violates  required  prviperties  ol  ihe  applicainin  -pniducing  an 
invalid  model  daiabase  Foi  example  nioviiig  an  endpoini  so 


that  an  originally  constrained  line  is  no  longer  horizontal: 
moving  a  wall  without  adjusting  those  adjoining  it;  leaving 
cables  dangling  in  a  car  engine  after  moving  the  alternator; 
moving  atoms  closer  than  electron  shells  allow. 

Changing  a  computer  object  so  that  it  mimics  the  properties  of 
its  physical  counterpart  can  be  arbitrarily  complex.  Most 
modeling  applications  leave  this  task  to  the  user.  For 
example,  moving  a  wall  in  an  architectural  model  requires  that  a 
user  rejoin  all  the  adjacent  walls  and  then  ensure  those  changes 
did  not  invalidate  the  model.  Some  molecular  modeling 
systems  let  a  user  invoke  a  batch  energy  minimizer  to  move 
atoms  into  a  valid  arrangement.  However,  such  automated 
post-processing  methods  can  change  the  model  differently  than 
the  user  intends. 


An  interactive  modeling  system  that  maintains  a  physically- 
valid  model  throughout  user  modifications  eliminates  the 
model  re-idealization  tusk.  This  section  presents  two  protein¬ 
modeling  examples  to  illustrate  complexities  that  can  arise  in 
manual  and  automated  methods  for  repairing  the  invalid 
models. 


5.1.  A  simple  edit  requiring  complex  repairs 

A  common  operation  in  molecular  modeling  requires  Hipping  a 
planar  segment  (peptide)  in  the  backbone,  surrounded  closely 
by  neighboring  atoms,  by  180  degrees.  Figure  I  shows  two 
stages  of  the  Hip  operation.  Figure  I -A  shows  the  center 
segment  and  its  neighbors  before  a  Hip.  Lines  represent  bonds 
between  atoms  and  hashed  areas  represent  rigid  planar 
segments.  Each  atom  contains  an  electron  shell  that  (to  a  first 
approximation)  cannot  intersect  other  electron  shells. 
Figure  1  represents  the  shells  with  circles  (notice  the  circles 
do  not  intersect  in  Figure  1-A).  Most  systems  only  allow 
rotations  about  the  C-C  and  C-  N  bonds  so  that  bond  lengths, 
angles  and  planar  groups  do  not  change.  This  makes  the  flip 
difficult  by  itself  since  one  rotation  affects  all  the  atoms  funher 
along  the  chain.  Figure  l-B  shows  the  center  segment  flipped 
180  degrees  after  an  appropriate  sequence  of  rotations.  The 
model  now  requires  repairs  because  the  circles  overlap. 

Manual  correction.  A  chemist  can  manuall)  adjust  the 
atom  positions  to  remove  the  intersections  in  Figure  l-U. 
Moving  an  atom  requires  that  a  chemist  choose  appropriate 
conibiiiatiuns  of  rotations  so  that  other  segments  do  not  move. 
Moving  one  aioni  usuall)  causes  interference  with  another, 
which  then  requires  additional  repairs.  Correctl)  fnting  the 
flipped  segment  often  causes  small  changes  that  propagate 
through  the  entire  protein.  In  practice  this  problem  is  much 
harder  because  a  khemist  fils  spheres  rather  than  circles  and 
approximates  iion-boiided  atom  iiileraviioiis  b>  getting  the 
spheres  to  touch.  Profess( Jane  Richardson,  a  collaborator 
trom  Duke  Lniversii>'s  Biochennstr)  Department,  usually 
adjusts  models  manually  alter  operations  sUch  as  this  flip  This 
example  takes  on  the  order  ot  fifteen  minutes. 


(A)  Before  tlip  of  middie.  pUiur  wgnient  about 
C-C  and  N-C  bond^ 


u 


iB)  Alter  nipelcvUi>n\he)U  overlap 

Figure  t:  Modeling  errors  Introduced  by  flipping 
a  rigid  planar  segment. 

Batch  minimization.  A  chemist  can  also  use  a  batch 
minimization  package  to  remove  the  intersections.  Such 
packages  find  a  local  minimum  of  the  ensemble  energy 
associated  with  the  overlapping  shells.  These  work  well  if  the 
atom  shells  only  slightly  overlap.  Overlaps  greater  than,  say, 
twenty  percent  contain  very  large  strain  energy  that  cause 
mininiizaiion  packages  to  make  large  changes  to  the  model. 
Batch  routines  often  resolve  such  interactions  by  moving 
atoms  the  chemist  did  not  intend  to  change.  Professor 
Richardson  interleaves  some  manual  iniervemion  with  energy 
minimizaiiun  to  avoid  these  undesirable  changes. 

Interactive  minimization.  Pertorming  this  operation  in 
Siutpi  requires  approximately  thirty  sevonds  (depending  on  the 
size  of  the  protein).  A  chemist  tugs  the  atoms  from  one 
orienialion  to  another  while  Siulpt  continuously  adjusts 
segments  along  the  chain  to  accommodate  the  change 
Throughout  the  operation.  Smlpt  maintains  a  valid  protein 
model.  Hi  nipt  does  nothing  here  that  batch  minimization 
systems  cannot  perform.  The  difference  is  the  small 
niiiiiniization  time  in  SiUlpi  allows  the  system  to  continuously 
nnninuze  the  energy  rather  iliaii  do  it  once  alter  the  user 
inieruciion 


5.2.  A  complex  task  requiring  exorbitant  re¬ 
idealization 

This  example  requires  changing  ihe  orientation  ot  two  helices 
between  Color  Plated  and  by  unwinding  the  lower  helix, 
counter-clockwise  by  ninety  degrees,  and  winding  the  upper 
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helix,  clockwise  by  ninety  degrees,  similar  to  unrolling  a 
scroll.  The  helical  structure  must  remain  after  the  operation. 
The  task  first  requires  large  structural  changes  to  the  model  (to 
twist  the  helices)  and  then  local  adjustments  to  remove 
hundreds  of  contacts  among  the  sidechains  ('an  vectors).  Color 
Plates  2  and  3  show  the  model  before  and  after  the  operation. 
Text  is  attached  to  nearby  sidechains  to  emphasize  the  change 
between  the  pictures;  yellow  indicates  nearby  sidechains 
before,  and  white  indicates  nearby  sidechains  after  the 
operation. 

Interactive  minimization.  Professor  Richardson 
performed  this  task  with  Sculpt  in  approximately  thirty 
minutes.  She  spent  most  of  the  time  turning  the  helices  by 
applying  radial  tugs  to  the  atoms  to  get  a  uniform  twist.  (A 
future  version  of  the  systems  will  include  rigid  segments  to 
reduce  the  time  for  this  operation.)  The  system  maintained 
proper  bond  lengths  and  angles  throughout  the  session.  She 
used  the  final  ten  minutes  of  the  session  arranging  sidechains 
to  change  the  contacts  among  their  atoms. 

Klanual  solution.  Solving  this  task  manually,  without 
energy  minimization,  is  not  feasible.  One  can  turn  the  helices 
in  two  ways.  The  first  way  requires  choosing  the  appropriate 
rotation  angles  between  segments.  This  is  an  extremely 
complex,  inverse'kinemalics  problem  involving  hundreds  of 
joints.  The  second  way  involves  breaking  the  connection 
(backbone)  between  the  two  helices,  rotating  each  helix,  and 
rejoining  the  connection.  Rejoining  the  connection  with 
proper  geometry  is  very  difficult,  though  easier  than  the 
inverse-kinematics  problem.  Once  the  two  helices  are  turned,  a 
chemist  must  resolve  hundreds  of  conta'  s  between  sidechain 
atoms.  Professor  Richardson  attempted  to  solve  this  task 
manually  but  quit  after  several  frustrating  days  and  was  never 
fully  satisfied  with  the  results. 

Batch  minimization.  A  chemist  could  specify  target 
positions  for  some  atoms  (if  such  end  positions  are  known)  and 
invoke  an  energy  minimization  package.  The  minimizer 
chooses  a  path  to  move  the  atoms  along  towards  their  targets. 
Certain  paths  can  tear  the  model  apart  in  order  to  reach  the 
target  (e.g.  through  the  middle  of  a  structure),  instead  of 
solving  the  problem  with  one  minimization,  a  chemist  may 
choose  subgoals  along  a  path  to  the  target  and  run 
minimizations  for  each  subgoal.  This  approach  works  belter 
than  the  manual  solution,  but  the  turnaround  time  between 
subgoal  minimization  limits  the  number  of  steps  picked  along 
the  path.  Continuously  running  a  minimization  as  a  chemist 
moves  atoms  to  targets  is  the  same  as  chousing  an  infinite 
sequence  of  subgoals  and  running  batch  niinimizatiuns  on 
each. 


6.  interactive,  guided  simulation 

Interactive  modeling  of  physical  properties  is  essentially  a 
form  of  interactive,  guided  simulation.  Placing  a  user  in  the 
computation-loop  ut  a  simulation  that  once  required  hours  or 
days  we  hope  will  provide  greater  insights  to  properties  and 
relationships  in  a  model.  This  section  discusses  benefits  ot 
interactive  simulations  compared  to  batch  simulation  and 
interactive  graphics  without  simulations  and  discusses 
complications  of  scientific  visualization  in  interactive 
simulations 


6.1.  Simulation 

Simulations  can  illustrate  molecular  properties  not  easily 
incorporated  into  brass  models  such  as  attractions  and 
repulsions  between  non-bonded  atoms.  Though  a  chemist 
understands  individual  attractions  and  repulsions  between  two 
atoms,  comprehension  of  hundreds  of  simultaneous 
interactions  becomes  very  difficult.  Simulations  aic  typically 
used  to  examine  specific  atom  interactions  in  a  molecule.  A 
simulation  requires  that  a  chemist  choose  model  parameters, 
run  the  simulation,  and  view  the  results  in  o  cine  loop.  If  the 
results  do  not  show  the  specific  interaction,  the  steps  are 
repeated  with  new  parameters.  Simulations  have  uncovered 
important  molecular  properties,  but  long  turnaround  times  have 
kept  this  from  being  a  common  exploration  tool  for  most 
researchers. 

Sculpt  lets  a  chemist  explore  non-bonded  interaction  while 
interactively  moving  atoms.  Professor  Richardson  believes 
interactively  exploring  protein  models  with  non-bonded 
interactions  will  improve  perception  of  subtle  relationships 
within  proteins.  In  several  sessions  Professor  Richardson  has 
seen  unexpected  reactions  that,  upon  closer  examination, 
resulted  from  non-bonded  interactions  competing  against  other 
properties  such  as  bond  rotations. 

interactive  modeling  of  physical  properties  augments  benefits 
from  butch  simulations  with  features  from  interactive  graphics. 
Today  chemists  use  interactive  graphics  to  study  a  static 
structure  or  series  of  structures  from  pre-computed  simulations. 
Interactively  controlling  the  view  and  display  parameters 
provides  more  cues  about  a  molecule's  structure  and  nature  than 
does  viewing  multiple,  static  images.  Guiding  an  interactive 
simulation  while  immediately  viewing  the  results  lets  the  user 
remain  continually  engaged  in  the  modeling  process.  I  believe 
this  provides  greater  situational  awareness  of  complex 
relationships  within  a  model  than  viewing  cine  loops  of 
simulations.  Guiding  an  interactive  simulation  lets  a  user 
stumble  upon  unexpected  reactions  in  the  model  that  may  go 
unnoticed  in  butch  simulations  (the  Ahuh!  phenomenon).  Also 
more  users  will  experiment  with  the  models  as  turnaround  time 
is  shortened. 

One  advantage  batch  simulations,  viewed  with  cine  loops,  have 
over  interactive  simulation  is  the  ability  to  replay  the 
simulation.  Since  a  cine  loop  is  a  sequence  of  frames,  a  user 
can  easily  move  backwards  in  the  sequence  to  study  a  particular 
property.  Unless  a  system  saves  all  user  actions  during  an 
interactive  simulation,  a  user  cannot  readily  return  lo  a 
previous  state.  Like  an  on-going  laboratory  experiment,  an 
event  cannot  be  repeated  without  re-running  the  experiment 
fnan  the  beginning  with  the  same  steps. 

6.2.  Visualization  of  non-bonded  forces 

Near-neighbor  interactions  among  nen-bonded  atoms  play  an 
important  role  in  protein  cunformatiuns  by  holding  those 
atoms  together  at  fixed  distances  A  protein  modeling  system 
should  convey  these  imeraclioiis  to  help  a  chemist  lightly 
pack  the  proiein’s  interior  These  interactions,  untortunately. 
are  not  as  simple  to  display  as  a  bond  (vector  connecting 
atoms)  Figure  2  plots  the  poieiuial  energy  of  the  van  der 
Waal  inieraciiun  between  iwo  atoms  as  a  function  of  their 
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separation  (1  Angstrom  =  10'*®  meters).  The  plot  shows  a 
maximum  attractive  (negative)  energy,  ^  separation  of 
Em-  The  energy  decreases  nonlinearly  as  the  separation 
iiicreases  from  £«.  The  energy  becomes  repulsive,  increasing 
at  a  different  nonlinear  rate,  as  the  separation  decreases  from 
Fm.  Each  atom  in  a  protein,  on  average,  interacts  with  ten 
atoms  within  a  six-Angstrom  radius  (the  model  in  Color 
Plate  1  contains  7,577  van  der  Waal  interactions).  A  useful 
visualization  of  a  non-bonded  interaction  should  convey  the 
type  (attractive  or  repulsive),  magnitude,  and  ideal  separation, 
««• 


two  atoms. 


Sculpt  displays  van  der  Waa!  interactions  that  have  an  energy 
magnitude  greater  than  a  user-defined  threshold.  A  partial 
spherical  shell  is  placed  around  both  of  the  interacting  atoms 
and  aligned  along  a  vector  between  them  (see  Color  Plate  4). 
Currently  a  shell  with  a  solid  angle  of  OAn  ste-adians  (ten 
prccnt  coverage’'  represents  the  weakest  interaction.  Solid 
angle  increases  with  the  magnitude  of  the  interaction.  Weak 
interactions  are  represented  by  dot  spheres,  and  strong 
ipteiactions  are  represented  by  wireframe  spheres.  A  dot- 
sphere  indicates  that  an  interac’*''*!  exists  wi.t.out  distracting 
the  user  and  cnnsu.ning  as  much  screen  space  as  the  wireframe 
sphere.  Blue  denotes  attraction,  and  red  denotes  repulsion. 

Color  Plate  4  illustrates  this  visualization  on  a  small  model. 
The  photograph  shows  a  spring  attached  to  a  planar  ring 
(highlighted  with  c  purple  tube)  that  pulls  one  atom  into 
another.  Notice  the  wireframe  shells  around  'he  two  atoms 
labeled  with  text.  The  shells  bend  rather  than  intersect  so  that 
the  vector  in  the  two  shells  do  not  interfere  visually, 
intersecting  wireframe  shells  are  difficult  to  associate  with 
their  respective  atoms. 

7.  Adding  physicci  modeling  to  interactive 
giaphics  systems 

The  physically-based  modeling  module  in  Sculpt  is  inserted 
ini  3  the  control  f.ow  of  an  interactive  graphics  systems  with 
minoi  modifications.  The  white  boxes  m  Figur  •  3  list  the 
sequence  of  actions  in  the  interactive  graphics  system-  the 
system  receives  a  user  action  (e,g.  mouse  movement), 
interprets  it  (move  an  atom  by  one  Angstrom  in  a  give 
direction),  applies  the  change  to  the  model  database  (change 
the  ccoidinates  of  atom),  and  displays  the  next  frame.  The 
shaded  box  shows  the  additional  step  that  modifies  the  user 


action  according  to  properties  oi  the  application  (e.g.  also 
adjust  distances  to  neighboring  atoms). 


Figure  3:  Steps  in  an  interactive  modeling  system 
for  processing  a  user  action. 

The  control-flow  presented  in  the  white  boxes  is  similar  to  the 
event  loop  of  many  graphics  systems  |71.  The  remainder  of 
this  section  discusses  some  implementation  i.^sues  addressed  in 
Sculpt  that  may  be  useful  to  others  wishing  to  incorporate 
physically-based  modeling  into  interactive  graphics  systems. 

7.t.  Constrained  mlnimlaation 

Sculpt  implements  the  shaded  Foa  in  Figure  3  with  a 
constrained  minimizer  in  the  following  manner.  Sculpt 
converts  a  user  action  into  a  potential  energy  function  (e.g.  a 
spring  to  pull  atoms).  Sculpt  then  finds  a  local  minimum  of  the 
total  system  energy  (from  protein  and  user)  that  also  satisfies 
the  set  of  bond  length  and  angle  cnivisirainis.  Mathematically, 
the  minimizer  solves  the  following  problem: 

Gi*'<n; 

X  model  stale  (e.g.  vector  of  alum  pusiiions) 

F.neigy(x)  sum  of  puieniial  energies  m  niud-l 

Cunslraint(x)  vector  of  con-trami  funcnons 

Solve: 

n  immize  Hnergy(xl 

such  itiai  Cunslraintlxi  =  G. 

The  minimizer  finds  the  solution  using  a  method  of  Lagrange 
muliiplie'-s  as  discussed  in  |6|,  tl7|  and  113|.  The  minimizer 
Jeiemiines  changes  in  atom  positions.  The  clianges  are  sent  to 
the  next  module  in  Figure  3  (Apply  action)  which  then  updates 
the  model  database. 

Other  consiruined-niimimzation  approaches  fii  within  the 
Iraniework  of  Figure  3.  Witkin  minimizes  a  potential  energy 
tunclion  116|  associated  with  the  physical  state  of  elastic 
models.  Anibuni  imniiiiizes  costs  associated  with  design  goals 
|3J.  Phillips  uses  kineinaiic  cwisiraints  to  reduce  allowable 
joint  movements  in  an  ariiculaied  tigure  while  iiiiiiimizing 
costs  associated  the  positioning  gOws  jlOl. 
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7.2.  Positioning 

Direct  versus  indirect  positioning.  Directly  moving 
an  object  to  a  new  location  can  violate  constraints.  For 
example,  moving  one  end  of  a  fixed-length  line  segment 
extends  its  length  if  the  other  end  cannot  move.  Indirect 
positioning  by  attaching  a  spring  to  an  object  and  tugging  the 
other  end  avoids  this  problem.  If  no  opposing  force  prevents 
movement  in  the  direction  of  the  tug,  the  result  is  the  same  as 
direct  manipulation.  However,  if  the  object  cannot  move  in  the 
direction  of  the  tug,  the  indirection  increases  the  potential 
energy  in  the  system  (because  the  spring  stretches)  but  does 
not  invalidate  the  model. 

Tugging  objects  also  lets  a  user  move  atoms  from  one  local 
minimum  to  another.  Figure  4  shows  an  example  where  user 
intervention  overcomes  a  local  energy  minimum.  Arrows  show 
the  direction  and  strength  of  the  attractions  among  atom  T  and 
the  fixed-position  atoms  f/  and  F?'  Figure  4- A  shows  the 
initial  state  with  atom  T  attracked  more  by  F/  than  Fj.  The  tug 
in  Figure  4-B  (indicated  with  the  dashed  arrow)  pulls  the  atom 
towards  Fi.  Figure  4-C  shows  the  final  result. 


T 

F, 

• 

Tug 

F|  _  F, 

(A)  Initial  attractions 

(B)  Attractions  with  a  user  tug 

T 

F| 

F, 

• 

• 

(C)  Final  attractions  after  tug 

Figure  4:  A  user  spring  pulls  atom  T  between 
energy  minima. 

Which  physics?  Dynamics  or  statics.  Physically- 
based  modeling,  as  it  has  most  often  been  used  in  computer 
graphics,  aims  to  determine  physically-realistic  motions  and 
trajectories  of  objects  with  specific  physical  properties  (e.g. 
blowing  flags  |14j  and  jumping  Luxo  |I8|).  The  approach 
solves  Newion's  second  law,  F  =  ma,  which  gives  the 
acieleratiun  of  objects,  and  c>  mbines  this  with  an  initial 
position  and  velocity  to  determai  e  motions. 

in  this  work,  stable  conformaiic"..  not  the  trajectories  of 
reacning  them,  are  the  concern.  Siiilpi  achieves  this  by 
modeling  potential  energy  rather  than  forces  in  a  model.  This 
gives  strains  between  objects  that  the  system  minimues.  The 
objects  never  contain  velocity  information.  Mminiiring  the 
potential  energy  (strain)  moves  the  objects  but  do.-s  not  induce 
momentum.  This  technique  pnovidev  greater  ontrol  over 
object  positions  and  takes  less  compUiUtion 


7.3.  Approximating  stiff  model  components  with 
constraints 

Sculpt  makes  an  approximation  that  dramatically  improves 
performance  without  appreciably  decreasing  accuracy. 
Properties  whose  deformation  requires  very  large  strain 
en'‘r>  -  relative  to  others  in  a  model  are  replaced  by  rigid 
constraints.  For  example,  a  bond  length  is  constrained  to  its 
ideal  value  since  the  potential  energy  increase  for  extending  a 
bond  is  five  orders-of-magnitude  larger  than  that  associated 
with  a  comparable  increase  in  distance  between  two  non- 
bonded  atoms. 

Minimizing  functions  with  similar  potential  energies,  subject 
to  constraints,  requires  significantly  less  computation,  in  this 
application,  than  minimizing  all  the  energies  without 
constraints.  Minimizing  potential  energy  functions  requires 
time-steps  small  enough  to  model  the  stiffest  properties 
accurately.  The  time-step  must  decrease  as  the  potential  energy 
separation  among  the  functions  increases.  Minimizing  all  the 
potential  energy  functions  requires  time-steps  orders-of- 
magnitude  smaller  and.  therefore,  requires  orders-of-magnitude 
more  steps  per  screen  update! 

is  this  approximation  valid?  Approximating  bond-length, 
potential  energy  functions  with  rigid  constraints  reduces  the 
accuracy  of  the  physical  model.  However,  the  large  potential 
energy  signifies  that  bond-length  variability  is  orders-of- 
magnitude  smaller  than  the  variability  of  other  properties. 
Since  the  bond  lengths  hardly  change,  constraining  them  for 
increased  performance  is  justified.  Sculpt  lets  a  chemist  trade 
performance  for  accuracy  when  desired,  by  modeling  lengths 
with  potential  energy  functions. 

An  important  principle  influences  this  approximation— only 
compute  what  is  significant.  Sculpt  follows  this  by  only 
accurately  modeling  properties  that  can  vary  significantly  and 
constraining  the  others.  This  approach  can  prove  useful  in 
other  applications  with  wide  variability  in  energy  magnitudes. 

8.  Other  applications 

Removing  model  re-idealization  and  enhancing  understanding 
of  model  propenies  will  most  likely  arise  in  other  interactive 
applications  that  incorporate  physically -based  modeling.  The 
particular  benefits  and  implementations  are  specific  to  the 
applications.  However,  similarity  between  the  control-flow  in 
Siulpt  and  other  applications  suggests  that  a  generic, 
physically-based  modeling  module  may  eventually  be 
developed.  For  now,  the  system  development  effort  may  be 
overkill  for  simple  modeling  applications  and  only  justified 
far  complex  modeling  applications.  I  conclude  with  two 
example  applications  that  can  benefit  from  adding  physically - 
based  modeling. 

8.1.  Architectural  layout 

Simple  changes  in  a  modeling  system  tor  architectural  models 
(e.g.  blueprints)  often  require  numerous  operations  l-or 
example,  narrowing  a  corridor  requires  moving  the  corridor 
walls  uiul  lengthening  the  walls  that  connect  to  it  A  large 
poriioii  ot  the  ettort  m  the  Building  Walkthrough  project  1 1 1  at 
the  University  ot  .North  Carolina  at  Chapel  Hill  is  spent  fixing 
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and  maintaining  databases  of  models  (these  databases  contain 
approximately  4,000  to  30,000  polygons).  An  automated 
radiosity  calculation  followed  by  viewing  uncovers  modeling 
errors,  including  walls  not  connected  to  ceilings  and  doors 
outside  the  plane  of  their  walls.  Most  of  the  errors  arise  from 
previous  database  edits  that  left  parts  of  the  model 
inconsistent. 

Applying  constrained  minimization  to  this  application  reduces 
these  burdens.  In  the  corridor  example,  constraints  can  require 
that  moving  the  corridor  wall  also  moves  the  connecting  walls. 
Additional  cost  functions  can  increase  as  certain  goals  are  not 
met  such  as  rooms  containing  a  certain  area  or  being  a  given 
distance  from  an  exit. 


8.2.  Drafting 

Most  interactive  drafting  and  drawing  systems  ignore 
application'Specific  properties  to  reduce  computation  and 
broaden  product  applicability.  They  base  operations  (e.g. 
move,  stretch)  on  individual,  geometric  primitives  (polygons, 
lines,  control  points,  etc.).  Information  regarding  an  object's 
construction  is  usually  discarded.  For  example.  MacDraw  I! 
lets  a  user  construct  a  line  constrained  to  the  horizontal,  but 
discards  the  horizontal  requirement  after  construction  (Sj.  The 
package  does  not  restrict  the  line  to  the  horizontal  if  a  user 
later  moves  one  of  its  endpoints.  Keeping  information  about 
an  object's  structure  and  properties  allows  a  system  to  maintain 
a  consistent  model  throughout  model  editing. 

9.  Futur*  work 

The  immediate  goal  for  future  work  is  to  place  Sculpt  in  a 
chemistry  lab  and  gather  results  about  its  usefulness  for 
solving  daily  protein-modeling  problems.  This  should  offer 
direction  for  future  enhancements  to  the  modeling  and 
visualization  components  of  the  system. 

The  visualization  issues  offer  large  scope  for  future  research. 
The  near-neighbor  visualization  discussed  in  this  paper  is 
adequate,  though  not  great.  1  will  continue  to  examine  the  near¬ 
neighbor  interactions.  A  much  harder  property  to  visualize  is 
long-distance,  electrostatic  interaction.  These  interactions  can 
extend  between  atoms  on  opposite  sides  of  a  molecule. 
Visualizing  these  interactions  will  be  hard. 

Finally,  I  plan  to  apply  the  techniques  described  in  this  paper 
to  other  applications.  The  input  to  the  Sculpt  system  is  a  list 
of  points  with  a  set  of  length  and  angle  functions  defined  on 
the  points.  Under  one  thousand  lines  of  modeling  code  (out  of 
ten  thousand)  is  specific  to  molecules.  With  this  framework  I 
hope  to  examine  interactive  manipulation  of  skeletal  figures 
without  significant  system  development. 
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ABSTRACT 

The  3D  components  of  todsy’s  user  interfaces  are  still  underdevel¬ 
oped.  Direct  interaction  with  3D  objects  has  been  limited  thus  far 
to  gestural  picking,  manipulation  with  linear  transformations,  and 
simple  camera  motion.  Further,  there  are  no  toolkits  for  building 
3D  user  interfaces.  We  present  a  system  which  allows  experimenta¬ 
tion  with  3D  widgets,  encapsulated  3D  geometry  and  behavior.  Our 
widgets  are  fint-class  objects  in  the  same  3D  environment  used  to 
develop  the  application.  This  integration  of  widgets  and  application 
objects  provides  a  higher  bandwidth  between  interface  and  applica¬ 
tion  than  exists  in  more  traditional  UI  toolkit-based  interfaces.  We 
hope  to  allow  user-interface  designers  to  build  highly  interactive 
3D  environments  more  easily  than  is  possible  with  today’s  tools. 

Keywords 

User  Interface  Design,  Widgets,  3D  Interaction,  Virtual  Reality 

1  introduction 

Modem  user-interface  software  is  built  using  widgets,  objects  with 
geometry  and  behavior  used  to  control  the  application  and  its  ob¬ 
jects.  However,  most  of  today's  user  interfaces  for  3D  applications 
take  tittle  advantage  of  the  third  dimension's  added  power,  predom¬ 
inantly  using  20  widgeji.  Commercial  modeling  and  visualization 
systems  typically  pre^’viit  one  or  t.  ore  3D  views  surrounded  by  a 
large,  hierarchical  menu  system,  often  with  supporting  dialog  boxes 
and  sliders.  The  menu  system  is  sometimes  replaced  or  augmented 
by  another  2D  interface  widget  such  as  a  network  or  hierarchy  ed¬ 
itor.  Direct  interaction  with  the  3D  world  is  limited  primarily  to 
interactive  viewing,  selection,  translation,  and  rotation.  3D  widgets 
used  in  these  interactions  include  a  30  cursor,  gestural  translation,  a 
virtual  sphere,  and  direct  manipulation  of  3D  spline  points  on  paths 
or  patches.  While  today’s  3D  applications  clearly  allow  users  to 
be  productive  with  the  current  interface  technology,  we  believe  that 
they  could  be  improved  significantly  by  making  greater  use  of  3D 
in  the  interface  itseT. 

In  virtual-reality  systems,  3D  interaction  is  especially  crucial. 
However,  the  significant  difficulties  of  3D  input  and  display  have 
led  research  in  virtual  worlds  to  concentrate  far  more  on  the  de¬ 
velopment  of  new  devices  and  device-handling  techniques  than  on 
higher-level  techniques  for  3D  interaction  [19].  Such  interaction 
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goes  no  further  than  a  straightforward  interpretation  of  device  data, 
such  as  using  a  Polhemus  for  a  head  tracker  or  a  DataGlove  for  sim¬ 
ple  gestural  recognition  of  commands  such  as  select,  translate  and 
rotate.  Some  virtual-reality  systems  make  use  of  menus  floating  in 
3-space  with  3D  icons  instead  of  2D  pixmap  icons  [3].  Besides  the 
additional  options  for  its  position,  however,  such  a  menu  provides 
no  more  expressive  power  than  its  2D  equivalent. 

There  are  many  reasons  for  the  undenitilization  of  3D.  First,  al¬ 
most  all  interaction  techniques  must  be  created  from  scratch,  since 
essentially  no  toolkits  of  3D  interaction  techniques  exist.  Second, 
such  toolldts  are  difficult  to  develop  until  metaphors  for  3D  inter¬ 
faces  grow  beyond  their  current  infancy.  Finally,  we  believe  such  a 
toolkit  is  intrinsically  more  difficult  to  create  than  its  2D  counterpart 
because  of  the  inherent  complexity  of  3D  interaction. 

Widget  toolkiu  are  well  known  for  2D  applications  (e.g.,  the 
Macintosh  Programmer’s  Ibolbox,  OSF/Motif,  X View)  (17).  How¬ 
ever,  3D  graphics  libraries  such  as  PHIQS'f  and  SGI’s  GL  provide 
very  little  supp'^n  fer  interaction  beyond  simple  device  handling. 
The  industry  standard  PHIGS+  provides  only  six  widgets  (pick,  lo¬ 
cator,  stroke,  choice,  valuator,  and  string).  Further,  the  application 
programmer  cannot  change  their  look  or  feel,  and  all  except  3D  pick 
correlation  are  low-level,  providing  little  functionality  beyond  that 
provided  by  a  physical  device.  Thus,  application  developers  are 
left  to  implement  basic  interactive  techniques  such  as  virtual  sphere 
rotation  themselves. 

Most  paradigms  and  metaphors  for  3D  interfaces  are  less  de¬ 
veloped  than  those  for  2D  interfaces.  Some  3D  metaphors  are  the 
natur'al  analogs  of  those  familiar  in  2D,  such  as  3D  menus  and 
rooms  (14)  (4).  However,  research  in  3D  interfaces  must  develop 
new  metaphors  and  interaction  techniques  to  take  advantage  of  the 
greater  possibilities  of  3D.  The  cone  tree  and  perspective  wall,  de¬ 
signed  at  Xerox  PARC  (22)  [13],  demonstrate  the  potential  of  3D 
representation  and  interactive  animation. 

User  interfaces  are  inherently  difficult  to  program  (17).  3D  in¬ 
terfaces  complicate  interface  design  and  implementation,  since  the 
interface  must  take  into  account  such  issues  as  a  richer  collection 
of  primitives,  attributes,  and  rendering  styles,  multiple  coordinate 
systems,  viewing  projections,  visibility  determination,  and  lighting 
and  shading.  Further,  3D  environments  allow  many  more  degrees 
of  freedom  than  those  easily  specified  with  common  interface  hard¬ 
ware  like  mice.  The  interface  can  easily  obscure  itself,  and  3D 
interaction  tasks  can  require  great  agility  and  manual  dextenty.  In¬ 
deed,  physical  human  factors  are  a  central  part  of  3D  interface 
design,  whereas  2D  interface  designers  can  assume  that  hardware 
designers  have  handled  the  ergonomics  of  device  interaction. 

This  paper  reports  some  first  steps  towards  the  goal  of  creating  a 
nchly  mteractive  3D  application  development  environment.  After  a 
more  detailed  discussion  of  the  problems  mherent  in  designing  and 
implementmg  3D  widgets,  we  present  a  framework  under  develop¬ 
ment  for  iheir  implementation,  design,  and  use.  By  working  with 
an  o,.;;ct-onented  notion  of  a  widget,  we  hope  to  provide  a  toolkit 
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of  modifiable  and  reusable  3D  interaction  techniques. 

2  Extending  Widgets 

There  are  several  points  to  consider  when  designing  an  environ¬ 
ment  for  developing  3D  widgets.  Most  fundamentally,  what  is  a 
widget?  How  do  existing  notions  of  widgets  derived  from  2D  envi¬ 
ronments  extend  to  3D  environments?  Secondly,  how  should  a  3D 
application  communicate  with  its  3D  interface?  Finally,  what  kinds 
of  primitives  are  needed  w  build  3D  widgets?  2D  environments, 
like  the  X  Window  System,  provide  raster  drawing  primitives  and 
event-based  callback  mechanisms.  What  sorts  of  primitives  should 
a  corresponding  3D  environment  provide? 

2.1  Defining  "widget” 

We  define  a  widget  as  an  encapsulation  of  geomet.-y  and  behavior 
used  to  control  or  display  information  about  application  objects. 
Although  this  definition  is  somewhat  vague  and  general,  it  has  the 
advantage  of  covering  all  the  areas  of  the  interface  literature  we 
have  explored,  from  general  constructs  such  as  Garnet's  Interaction 
Objects  (16)  and  the  Interactive  Objects  of  Xerox's  3D  Rooms  (21) 
to  veiy  specific  kinds  of  widgets  such  as  those  found  in  the  X  Toolkit 
or  the  Macintosh  Toolkit. 

The  extent  to  which  a  2D  widget  should  be  classified  as  consisting 
of  behavior  or  of  geometry  varies  widely.  Some  useful  widgets  are 
primarily  geometric,  such  as  the  dividing  lines  and  frames  that 
serve  to  organize  and  partition  an  interface.  Others,  such  as  a 
gestural  rotation  widget  in  an  object-oriented  drawing  program, 
have  no  inherent  geometry.  3D  widgets  encompass  a  similar  range 
of  geometry  and  behavior.  This  makes  our  definition  of  the  term 
“widget"  useful  for  understanding  interface  problems  that  are  not 
dimension-specific. 

2.2  Comparing  common  20  widgets  and  3D 
widgets 

Despite  their  often  complex  appearance,  most  2D  widgets  have  very 
simple  behavior.  They  commonly  have  few  degrees  of  freedom 
(usually  only  one)  and  support  only  a  small  range  of  values  within 
a  degree  of  freedom.  Thus,  white  toggle  buttons  have  bitmap  icons 
to  represent  different  slates,  they  represent  only  a  single  bit  of 
mfonnation,  and  similarly,  sliders  represent  a  single  number  within 
a  range,  usually  only  a  small  integer  range. 

3D  space  mherently  has  more  degrees  of  freedom  than  2D  space; 
a  ngid  flymg  body  has  six  degrees  of  freedom  in  3D  versus  three 
in  2D.  3D  graphics  libraries  are,  in  general,  more  capable  of  han- 
dUng  general  transformations  than  their  2D  counterparts.  As  noted, 
common  2D  widgets  rarely  take  advantage  of  all  the  degrees  of 
freedom  available  to  them.  The  use  of  multiple  degrees  of  free¬ 
dom  to  enhance  interaction  is  thus  largely  unexplored  potential, 
even  m  2D  (23),  and  3D,  with  its  greater  degrees  of  freedom,  has 
correspondmgly  greater  potential.  This  potential  must  of  course  be 
handled  with  restramt;  while  we  would  like  to  be  able  to  use  several 
degrees  of  freedom  simultaneously,  usmg  too  many  may  make  the 
widget  too  difficult  to  use.  Rather,  mterface  designers  should  be 
able  to  specify  any  subset. 

The  user  mteracts  with  most  widgets,  whether  2D  or  3D,  through 
manipulation  mvolving  motion  and  simple  gestures  that  are  in¬ 
terpreted  directly,  to  produce,  for  example,  a  slidmg  button  or  a 
popup  wmdow.  However,  the  user  can  gam  more  expressive  power 
through  mteraction  techniques  that  mterpret  and  process  movements 
and  make  possible  more  sophisticated  mteraction  Fur  example,  a 
calligraphic  drawmg  program  can  attach  a  pen  to  a  cursor  by  means 
if  a  sunulated  sprmg  (9),  a  simple  motion -control  technique  that 
makes  possible  a  whole  new  range  of  drawings  not  easily  created 
with  a  rigid  pen-cursor  linkage. 

Both  2D  and  3D  widgets  can  benefit  from  more  sophisticated 
reaction  to  user  mput  Interaction  can  potentially  achieve  substan 


tial  gains  by  using  such  techniques  as  dynamic  constraints,  inverse 
kinematics,  and  physical  simulation  as  components  of  direct  ma¬ 
nipulation  interfaces.  These  techniques  currently  appear  only  in 
systems  designed  explicitly  to  present  or  use  them,  such  as  demos 
or  prototypes,  but  in  the  future,  these  techniques  should  be  as  acces¬ 
sible  as  any  other  component  in  the  widget  designer's  repertoire  (8) . 

2.3  Integrating  the  application  and  the  user 
interface 

User  interfaces  were  originally  designed  by  application  program¬ 
mers  using  the  same  tools  they  used  to  build  applications.  This  pro¬ 
duced  interfaces  that  were  tightly  integrated  with  the  application. 
Recently,  however,  mterface  design  is  more  often  done  by  specialists 
using  UI  development  tools  (17).  While  this  separation  produces 
more  consistent  interfaces  and  more  modular  programs,  it  can  also 
produce  interfaces  that  are  not  as  helpful  as  they  could  be  if  they 
were  more  specialized  to  the  application  —  the  interface  designer 
is  not  only  aided  but  also  limited  by  the  toolkit  and  its  metaphors. 
In  particular,  as  has  been  noted  by  those  critiquing  WIMP  inter¬ 
faces  (8),  today's  toolkits  are  not  oriented  towards  highly  interactive 
applications. 

Such  highly  interactive  applications  require  a  high  bandwidth 
between  the  application  and  the  user  interface,  particularly  for  se¬ 
mantic  feedback  (8).  Prior  UI  research  indicates  that  this  may  be 
best  accomplished  if  the  application  and  the  interface  are  part  of  the 
same  development  environment,  with  the  same  tools  being  used  to 
build  both  (18).  An  integrated  environment  has  additional  software 
engineering  benefits.  First,  only  a  single  paradigm  must  be  learned, 
rather  than  one  for  the  interface  and  another  for  the  application. 
Also,  separate  paradigms  can  be  hard  to  integrate  at  several  lev¬ 
els:  the  conceptual  level,  the  code  implementation  level,  and  the 
compile-debug  level.  Advocating  integration  is  not  a  call  to  abol¬ 
ish  modularity  in  application  and  interface  design.  Rather,  it  is  a 
suggestion  that  the  principles  of  modularity  can  be  pushed  too  far. 
The  reasons  for  separating  the  application  from  the  user  interface 
are  valid,  but  the  benefits  of  a  single  development  environment  may 
outweigh  the  benefits  of  using  two,  especially  for  3D  applications. 

Consider  the  benefits  of  higher  bandwidth  between  the  applica¬ 
tion  and  the  interface.  A  menu  selection  is  a  relativ  small  amount 
of  input  that  specifies  only  an  operation,  operand,  m  attribute,  leav¬ 
ing  other  parameters  to  be  specified  elsewhere  (perhaps  in  another 
menu  or  a  dialog  box).  Gestural  interfaces,  on  the  other  hand,  allow 
the  user  to  specify  operation,  operand,  and  parameters  in  a  single 
action  (23),  providing  a  faster  interface  and  commands  that  do  not 
depend  on  previous  or  further  actions. 

In  addition  to  providing  better  input,  a  tighter  integration  between 
application  and  interface  lets  the  appheation  provide  semantic  feed¬ 
back  while  the  user  is  interacting.  Structured  program  editors  have 
provided  this  kmd  of  functionality  for  many  years  through  syntax 
checkers  that  check  for  or  prevent  syntactic  errors  as  the  user  types. 
Similarly,  some  2D  graphical  cucuit  design  tools  prevent  the  user 
from  m^mg  physically  impossible  or  illogical  connections. 

Existing  UI  toolkits  do  allow  callbacks  to  alter  a  widget  based  on 
application  feedback,  but  the  mechanisms  to  do  so  are  often  clumsy 
and  hard  to  use.  Our  interfaces  are  constructed  in  an  environment 
called  UGA  (25)  in  which  widgets  can  actively  depend  on  the  state 
of  other  widgets,  m  the  same  way  that  any  other  objects  (e.g.,  the 
application's  ubjeeb)  m  our  system  can  depend  on  each  other.  Our 
widgets  are  not  external  to  the  application  model.  They  are  first 
class  objeeb,  mdistinguishable  from  application  objeeb.  Thb  pro¬ 
vides  the  UI  designer  with  all  of  our  system's  power  for  specifymg 
behavior  and  geometry,  and  gives  as  high  a  bandwidth  between  ap¬ 
plication  and  mterface  as  between  application  objeeb  themselves, 
crealmg  the  possibility  of  mterfaces  that  are  tightly  coupled  with 
the  application,  both  for  mput  and  for  output. 

We  have  advocated  both  widgeb  and  widgeb  that  are  tightly 
integrated  with  an  applicalioi.  The  latter  idea  is  the  more  powerful 
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of  the  two,  since  it  can  apply  to  all  areas  of  interface  design.  In 
the  remainder  of  the  paper,  we  consider  tools  applicable  to  mte- 
grated  widgets  and  then  examine  some  case  studies  of  integrated 
3D  widgets.  , 

3  Tools  for  Designing  and  Implementing 
Integrated  Widgets 

3D  interfaces  are  presently  too  underdeveloped  for  us  to  specify  a 
comprehensive  library  of  tools  for  building  useful  interfaces.  We 
have  therefore  devised  an  environment  that  provides  a  great  degree 
of  flexibility  to  design  new  3D  widgets.  It  is  often  pointed  out  that 
flexibility  in  a  user-interface  design  environment  is  a  double-edged 
sword,  {Rowing  novel  and  useful  interfaces  as  well  as  novel  and 
useless  interfaces.  Because  of  the  undeveloped  state  of  current 
3D  interfaces,  however,  we  prefer  to  allow  the  possibility  of  some 
poorly  conceived  designs  rather  than  rule  out  unexplored  possibili¬ 
ties. 

3.1  Dependencies  and  controllers 

UG  A  supports  the  geometric  components  of  widgets  through  its  rich 
modeling  environme’'.t.  The  system  supports  the  behavioral  aspects 
of  widgets  through  one-way  constraints  called  dependencies  (25). 
An  object  can  be  explicitly  related  to  another  object  by  using  a 
dependency.  Since  widgets  are  flrst-class  objects  in  UGA,  they  can 
use  this  dependency  mechanism  as  easily  as  application  objects  can. 
For  example,  a  cu^  can  become  a  simple  slider  by  constraining  it 
to  move  only  along  its  x  axis,  and  a  torus's  inner  radius  can  then 
depend  on  the  x  position  of  the  cube. 

To  provide  multi-way  constraints  and  cyclical  constraint  net¬ 
works  (18),  we  use  controUers  [25],  objects  whose  primary  purpose 
is  to  control  other  objects.  Thus,  our  dynamic  constraint  solver  is 
encapsulated  as  a  controller.  Additionally,  we  encapsulate  physical 
devices  as  controllers  that  filter  and  pass  values  to  objects.  Finally, 
we  can  use  controllers  to  encapsulate  simulation  methods,  such  as 
inverse  kinematics  or  collision  detection.  By  employing  controllers, 
widgets  can  make  use  of  general  constramts,  haidware  devices,  and 
simulation  techniques. 

3.2  A  dialog  moctol  for  sequencing 

Some  researchers  choose  to  separate  U1  design  into  two  broad  cat¬ 
egories;  data-oriented  UI  design,  usually  supported  through  con¬ 
straints,  and  dialog-oriented  Ul  design  (11).  We  find  both  models 
useful.  In  addition  to  the  data-oriented  mechanisms  of  dependen¬ 
cies  and  controllers,  we  provide  a  dialog  model  that  uses  augmented 
transition  networks  (ATNs).  We  use  ATNs  because  the  sequencmg 
of  an  interface  is  explicitly  declared  and  is  more  easily  visualized 
in  a  hierarchical  ATN  than  in  context-free  grammars  or  event  sys¬ 
tems  (7). 

A  simple  transition  network  is  a  finite-state  automaton  (FSA). 
A  complex  interface  can  be  described  as  an  FSA  but  the  complex¬ 
ity  produces  a  combmatonal  explosion  of  FSA  states.  Augmented 
transition  networks  handle  some  of  the  limitations  of  simple  FSAs 
(allowmg  such  behaviors  as  definite  loops  without  specifying  mter- 
mediate  states)  by  addmg  vanables  and  conditional  transition  along 
arcs  based  on  the  values  in  the  variables.  Recursive  transition  net¬ 
works  are  used  to  provide  hierarchy  for  ATNs,  by  allowing  control 
m  one  ATN  be  suspended  until  a  recursively  mvoked  ATN  reaches 
its  final  state. 

Normally,  an  ATN,  even  a  recursive  one,  has  only  one  cunent 
state.  Therefore,  some  events  that  can  happen  at  any  tune,  such  as 
an  "abort”  or  “help”  request,  are  especially  cumbersome  to  specify, 
requiring  an  additional  arc  from  every  state  m  the  ATN  By  contrast, 
event  systems  have  greater  expressiveness  than  ATNs  (7],  smce  they 
can  easily  handle  an  “abort”  or  “help”  event  by  sunply  adding  a  new 
event  handler  to  process  this  event.  This  would  seem  to  make  evem 
systems  a  better  choice.  However,  notions  of  current  state,  history. 


or  context  are  more  difficult  to  express  in  event  systems.  Consider  a 
“help”  event  that  should  provide  context-sensitive  information.  An 
event  model  must  provide  a  different  event  for  each  context.  On 
the  other  hand,  an  ATN  can  handle  a  uniform  “help”  event,  with 
arcs  corresponding  to  context-dependent  actions  looping  back  to 
each  state  or  leading  to  one  or  more  help  states.  We  would  like 
a  dialog  model  that  combmes  the  best  features  of  both  ATNs  and 
event  handlers. 

Thus,  we  modify  the  ATN  model  to  allow  possibly  disconnected 
components  of  the  state  graph  and  more  than  one  active  state  [12]. 
We  can  now  represent  a  set  of  event  handlers  as  a  group  of  discon¬ 
nected  states  in  an  ATN,  one  state  per  event  handler,  each  with  a 
single  arc  back  to  itself.  The  arc’s  input  tokens  represent  the  corre¬ 
sponding  event  handler’s  events,  and  the  arc’s  action  represents  the 
handler  routine.  However,  we  can  add  explicit  sequencing  to  this 
ATN.  For  example,  in  our  model,  it  is  easy  to  specify  the  sequence 
of  events  found  in  snap-dragging,  described  in  Section  4.3,  but  rel¬ 
atively  cumbersome  to  specify  in  an  event  model,  because  of  the 
need  to  represent  histoiy. 

Our  dialog  model  also  allows  a  clean  separation  of  subparts  of 
the  interface  (i.e.,  individual  widgets  or  groups  of  widgets).  The 
dialog  specification  of  each  subpait  can  be  represented  as  a  subgraph 
of  the  ATN  that  describes  the  specification  of  the  entire  interface. 
These  subpails  can  mn  m  parallel,  corresponding  to  a  situation  in 
which  several  widgets  are  logically  operating  at  the  same  time.  This 
parallelism  is  veiy  useful;  we  can,  for  example,  use  the  mouse  to 
control  both  a  3D  cursor  and  a  higher-level  widget,  such  as  the  rack 
described  in  Section  4.S. 

The  components  of  this  dialog  model,  such  as  the  individual  states 
in  the  ATN,  are  first-class  objects  in  our  system.  Since  the  dialog 
model  is  embedded  in  the  same  environment  as  the  application  itself, 
dependencies  can  be  used  to  establish  the  connections  between  the 
ATN  and  the  application  that  allow  each  to  modify  the  other. 

3.3  Applying  object  construction  techniques 

The  UGA  system  supports  a  rich  set  of  modeling  primitives  and  op¬ 
erations,  including  constructive  solid  geometry  (CSG),  volumetric 
sculpting,  spline  patch  objects  and  deformations.  Both  geomet¬ 
ric  and  non-geometric  modeling  techniques,  such  as  hierarchical 
grouping,  can  be  applied  to  widget  creation.  Geometric  techniques 
are  used  to  specify  a  widget’s  geometry.  Correspondingly,  since 
ATN  states  are  first-class  objects,  they  can  be  organized  using  non¬ 
geometric  object  grouping  techniques.  Thus,  both  a  widget’s  ge¬ 
ometry  and  behavior  are  specified  in  the  same  unified  framework, 
the  framework  of  the  application  objects  it  controls. 

The  underlying  construction  technique  we  use  is  delegation, 
where  one  object  (the  -hild)  is  created  from  a  pre-existing  object 
(the  parent)  (24)  [10).  If  the  parent  object  is  changed,  the  child 
changes  as  well.  Since  both  the  parent  and  its  children  are  objects 
in  the  system,  and  any  object  can  be  a  controller  modifying  other  ob¬ 
jects,  one  of  the  children  can  modify  the  parent  object,  and  therefore 
modify  Itself  and  all  of  its  siblings.  Delegation  provides  the  ability 
to  change  large  portions  of  the  interface  at  once.  Furthennore,  since 
delegation  relationships  are  maintamed  at  run  time,  we  can  modify 
the  interface  without  recompiling.  This  allows  rapid  prototyping  of 
mterface  designs. 

4  Examples  of  30  Widgets  in  Our  System 

Our  user  mterface  group  has  developed  several  simple  3D  widgets 
in  our  framework-  .Some  of  these,  such  'is  the  virtual  sphere  and  the 
cone  tree,  duplicate  other  researchers’  widgets,  others  are  expen- 
ments  with  new  paradigms  foi  ;  3D  user  intertace.  We  present  these 
widgrsts  below,  explaining  the  design  process  we  used  in  creating 
them,  and  stress  the  progress  made  possible  by  r.qu.i  prototyping 
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4.1  A  virtual  sphere 

A  virtual  sphere  rotation  widget  can  be  handled  by  a  simple  two-state 
ATN  (Figure  1).  The  ATN  processes  mouse  motion,  passing  the 
mouse  positions  to  a  function  that  maps  the  2D  mouse  coordinates 
into  another  object’s  space,  in  this  case  producing  a  point  on  the 
surface  of  a  sphere.  The  deltas  between  a  series  of  these  projections 
produce  rotations.  We  can  easily  change  the  kind  of  object  that 
mouse  coordinates  are  mapped  to,  so  as  to  produce  a  “virtual  cube” 
or  "virtual  donut."  This  sort  of  modification  of  the  interface  can  be 
done  at  tun  time. 


Start  State 


start  rotation 
on  mouse  down 


while  mouse  not 
up,  rotate 


Rotate  State 


finish  rotation 
on  mouse  up 


Figure  1;  A  two-state  ATN  for  virtual  sphere  rotation 


4.2  Handin 

Object  handles  (6]  are  a  3D  widget  that  contains  more  visual  geom¬ 
etry  than  the  virtual  sphere  widget.  We  can  build  handles  with  an 
arbitrarily  complex  appearance.  Once  they  are  built,  we  are  free  to 
establish  dependencies  on  them  or  use  them  as  a  controller.  Color 
Plate  I  shows  various  handles  being  used  to  translate,  rotate  and 
scale  an  object. 

The  same  kind  of  constrained  motion  can  be  produced  by  hold¬ 
ing  down  various  modifier  keys  or  different  combinations  of  but¬ 
tons  (20).  However,  a  user  presented  with  such  an  interface  has  no 
easy  way  to  determme  what  the  possible  actions  ate.  Handles  allow 
constrained  motion  through  intuitive  direct  manipulation:  when  a 
particular  handle  is  selected,  motion  is  constrained  along  or  around 
the  axis  it  describes.  For  example,  clicking  on  an  object-space 
translation  handle  located  along  an  object’s  x  axis  limits  translation 
to  the  AT  axis. 

The  visual  feedback  of  a  widget  can  range  from  the  direct  move¬ 
ment  of  the  selected  object  to  more  complex  widgets,  such  as  han¬ 
dles  that  include  numerical  output  and  other  quantitative  indicators. 
Because  our  system  provides  rich  support  for  geometry,  the  same 
set  of  primitives  used  for  th:  application  can  be  used  to  assemble 
widgets  and  their  visual  feedback.  The  behavior  of  handles  can 
be  produced  without  the  corresponding  geometry.  An  example  is 
the  creation  of  “hot  spots"  on  an  object  that  may  or  may  not  have 
a  visual  indication.  The  behavior  of  a  virtual  sphere  can  in  turn 
be  augmented  with  geometry  —  for  instance,  a  semi-transparent 
sphere  can  be  placed  around  the  object  during  rotation  to  convey 
the  behavior  of  the  widget  to  the  user  more  effectively.  The  flexi¬ 
bility  of  the  system  allows  the  widget  designer  and  user  to  explore 
a  wide  range  of  options. 

4.3  Snapping 

With  a  more  intricate  ATN  (Figure  2)  we  can  perform  simple  snap- 
draggmg  [2].  A  mouse’s  coordmates  are  used  to  generate  a  ray 
from  the  camera  through  the  projection  of  the  mouse's  position 
onto  the  viewplane  If  this  ray  intersects  an  object,  the  ATN  lets 
the  user  choose  a  pomt  on  an  object  to  snap  to  a  pouit  on  another 
object.  Since  this  is  done  with  ray  intersection,  the  pomt  to  snap 


includes  a  complete  Frenet  frame  [151  definedby  the  surface  normal 
and  tangents.  When  the  user  releases  the  mouse  button  and  clicks 
again,  the  ATN  begins  checking  to  see  if  the  ray  specified  by  the 
mouse  intersects  another  object.  If  so,  this  new  object  becomes  the 
object  to  snap  to.  Again,  the  user  can  choose  exactly  which  point 
to  use,  including  the  entire  Frenet  fi-ame.  When  the  user  has  chosen 
both  points,  the  widget  produces  a  transformation  to  align  the  two 
frames,  and  applies  it  to  the  first  object. 


SnapPoint  State  ChooseGoal  State 

mouse  motion,  mouse  motion, 

update  snap  point  rubber  band 


mouse  down, 
select  1st  object 


Start  State 


mouse  up 
snap  objects 


mouse  down, 
select  2nd  object, 
choose  point 
to  snap  to 


o 


mouse  mouon, 
update  point  to  snap  to 


PerlormSnap  State 


Figure  2;  A  four-state  ATN  for  intei&ctive  snapping 

By  changing  the  states  in  (he  ATN,  the  user  can  experiment  with 
different  ways  of  specifying  snap-dragging.  Several  different  ATNs 
for  different  snapping  techniques  can  be  concurrently  developed 
and  experimented  with,  even  at  run  time.  For  example,  a  user 
could  develop  a  more  complex  ATN  to  allow  the  specification  of 
the  distance  between  the  surfaces  as  well  as  the  relative  orientation 
of  the  Frenet  frames. 

4.4  A  color  picker 

Color  spaces  are  inherently  multidimensional.  To  illustrate  these 
spaces  we  can  build  a  color  picker  m  three  dimensions  and  show 
how  changes  in  the  values  affect  the  output  color.  Color  plate  II 
shows  two  interactive  views  of  RGB  color  space  and  one  interactive 
view  of  HS  V  color  space.  One  view  of  RGB  space  is  built  with  three 
sliders,  each  of  which  was  specified  using  dependencies.  Another 
view  is  built  using  a  cubical  marker  that  can  translate  within  the 
bounds  of  a  unit  cube.  Here,  each  axis  of  the  cube’s  position  rep¬ 
resents  a  component  of  the  color  value.  Thus,  all  three  components 
can  be  specified  simultaneously  using  3D  gestural  translation  The 
third  view  is  of  HSV  space.  As  in  the  RGB  cube,  the  position  of 
the  spherical  marker  in  the  center  represents  the  three  components 
of  the  HSV  color.  The  constraints  on  the  sphere  permit  it  to  move 
around  in  the  cone  that  represents  valid  HSV  color  values. 

All  of  the  spaces  are  different  visualizations  of  the  same  data, 
kept  consistent  through  the  use  of  dependencies.  Thus,  a  user  can 
choose  a  color  in  one  view  and  see  how  that  color  Ls  represented  in 
the  other  two.  As  the  user  mteractively  chooses  a  color,  the  oilier 
two  color  representations  update  accoidingly  Users  familiar  with 
the  RGB  space  can  leam  about  the  nature  of  HSV  space  by  walchmg 
the  motion  of  the  HSV  mdicator  as  they  move  the  RGB  indicator. 
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4.5  The  rack 


4.6  The  cone  tree 


Recall  that  the  ATN  states  are  first-class  objects  and  that  our  system 
provides  hierarchical  grouping  of  objects.  An  ATN  can  pass  control 
to  another  ATN  through  dependencies  and  controller  mechanisms. 
Thus,  pre-existing  ATN’s  can  be  grouped  together  to  form  a  more 
complex,  hierarchical  ATN  (see  Figure  3)  that  controls  the  sequenc¬ 
ing  of  the  lower-level  ATNs.  In  other  words,  we  can  build  more 
complex  widgets  out  of  pre-existing  widgets. 

To  construct  a  more  complex  widget,  we  start  with  the  simple 
rotation  and  translation  handle  widgets  discussed  in  Section  4.2. 
By  rearranging  them  and  changing  their  connections,  we  combine 
them  to  form  a  “rack”  for  specifying  high-level  deformations  such 
as  twists,  tapers  and  bends  (1],  shown  in  Color  Plate  IV. 

Different  handles  specify  ^e  parameters  to  three  deformations. 
The  distance  between  the  two  upright  handles  specifies  the  range 
over  which  the  deformation  applies.  The  angle  of  the  red  handle 
on  the  end  indicates  the  amount  of  bend,  and  the  angle  of  the  pink 
handle  indicates  the  amount  of  twist,  while  the  height  of  the  blue 
handle  indicates  the  amount  of  taper.  By  reconfiguring  the  rack, 
changing  the  number  of  handles  and  their  respective  behaviors,  the 
user  can  control  how  the  deformation  is  specified.  Specialized  racks 
that  only  bend,  taper,  or  twist  can  be  easily  built.  A  new  rack  can  be 
designed  to  apply  wave  deformations,  or  to  allow  both  geometric 
transformations  and  nonlinear  deformations  at  the  same  time. 

Textual  specification  of  a  bend  defonnation  requires  four  floating¬ 
point  values  and  two  vectors.  The  rack  specifies  all  of  these  visually. 
The  major  axis  of  the  rack  specifies  one  vector,  and  the  red  handle 
specifies  another  vector,  determining  the  angle  and  direction  in 
which  the  object  should  bend.  The  floating-point  values  are  all 
specified  by  how  much  particular  handles  are  moved. 


Translate  widget  used  for  taper 


Figure  3:  Several  ATNs  can  be  combined  to  form  a  more  complex 
widget.  This  widget  specifies  high-level  deformations. 


The  rack  is  a  widget  that  provides  a  more  meaningful  interface  to 
complex  deformations  than  a  conventional  widget  such  as  a  panel  of 
i.ndependent  sliders.  Such  a  panel  provides  no  semantic  correlation: 
the  user  must  extrapolate  a  single  deformation  from  multiple  inde¬ 
pendent  slider  positions.  Thus,  the  rack  serves  to  abstract  out  the 
essential  characteristics  of  a  deformation.  When  handles  ore  used  to 
translate  an  object  in  its  own  object  space,  the  handles  themselves 
give  the  user  feedback  on  the  orientation  of  that  space,  wiiich  might 
not  be  apparent  from  Uie  object  iiselt'.  Sunilarly,,  an  object  being 
deformed  with  the  rack  may  be  so  geometrically  complex  that  it  has 
no  clear  axis  around  which  to  twist,  bend  or  taper  The  rack  provides 
this  axis,  along  with  immediate  and  understandable  feedback  about 
the  magnitude  and  effects  of  the  deformations. 


More  complicated  metaphors  for  3D  interfaces  can  be  constructed 
and  experimented  with  in  our  system.  A  large  number  of  rotation 
widgets  can  be  assembled  into  a  Xerox  PARC-style  cone  tree.  Here, 
we  use  the  cone  tree  to  display  the  hierarchy  of  a  3D  model  (Color 
Plate  EH).  The  cone  tree  is  itself  an  object  in  the  system  and  can  be 
freely  manipulated  as  a  whole. 

The  nature  of  this  widget  inherently  requires  motion  control  to 
animate  the  rotation  of  the  subtrees.  When  we  modify  the  cone 
tree,  we  can  affect  the  underlying  geometric  hierarchy  it  represents. 
Moving  subtrees  of  the  cone  tree  to  other  nodes  in  ^e  tree  affects 
the  hierarchy  of  the  model  that  the  cone  tree  represents.  If  we  use 
other  tools  to  modify  the  hierarchy,  the  cone  tree's  structure  is  also 
updated. 

Since  the  cone  tree  is  itself  a  widget,  we  can  combine  it  with 
other  widgets  to  make  more  intricate  information  browsers,  much 
as  simple  rotation  and  translation  widgets  were  composed  above  to 
make  a  deformation  editor.  We  plan  to  explore  using  cone  trees  to 
represent  portions  of  a  hypermedia  graph  that  are  primarily  hierar¬ 
chical  but  have  some  cross-links,  e.g.,  a  multimedia  technical  paper 
with  its  various  sections,  subsections,  references,  and  see-also's, 

5  Conclusions 

5.1  Accomplishments 

We  have  presented  a  concept  of  3D  widgets  as  first-class  objects  en¬ 
capsulating  behavior  and  geometry  that  can  be  treated  as  any  other 
objects  in  a  3D  world.  Their  behaviors  may  be  defined  using  com¬ 
plex  control  methods  and  user  input  techniques.  We  have  provided 
a  first  implementation  of  these  widgets  within  the  UGA  system. 
Widgeu  can  be  rapidly  prototyped,  modified,  and  combined  into 
more  complicated  systems  of  widgets.  Close  integration  with  the 
application  allows  rich  forms  of  interaction  and  feedback  in  our  3D 
applications. 

5.2  Future  work 

Constructing  3D  widgets  is  reasonably  fast  with  our  system.  How¬ 
ever,  widget  designers  at  present  must  be  experts  in  the  use  of  UGA. 
We  hope  to  make  specifying  3D  widgets  even  more  natural  and  in¬ 
tuitive  than  it  is  now,  so  that  a  far  less  technically  expert  designer 
can  implement  3D  widgets.  Part  of  the  complexity  stems  from  lim¬ 
itations  of  dependencies.  We  might  address  these  limitations  with 
a  more  generic  constraint  model  at  the  basic  system  level,  making 
it  easier  to  specify  some  of  the  complex  relationships  of  3D  wid¬ 
gets.  In  addition,  our  system  does  not  run  as  fast  as  we  would 
like,  even  on  today's  high-end  platforms.  A  large  portion  of  time 
is  spent  evaluating  dependencies.  Unfortunately,  the  addition  of  a 
more  generic  constraint  model  is  not  likely  to  help  performance. 
Thus,  dependencies  merit  a  close  look,  at  both  the  conceptual  and 
the  implementation  level. 

We  would  like  to  continue  developing  individual  widgets  and 
explonng  the  potential  of  various  techniques  from  the  world  of  3D 
graphics  in  mterface  design.  We  want  to  mvestigate  the  use  of 
more  sophisticated  motion  control,  modeling  and  rendering  tech¬ 
niques  for  3D  widgets.  We  can  foresee  widgets  that  will  use 
dynamic  constraints,  physical  simulation,  volumetric  techniques, 
particle  systems,  and  even  radiosity.  Our  application  framework 
already  mcludes  many  of  these  techniques,  so  it  is  simply  a  matter 
of  then  unaguiative  application  m  our  system  to  make  use  of  such 
techniques  m  30  interfaces. 

In  addition,  we  are  m  the  process  of  constructmg  full  3D  appli¬ 
cations  and  mterfaces  with  the  system  presented.  We  believe  the 
unusual  nature  of  our  widgets  will  provide  some  mterestmg  avenues 
of  exploration.  Smee  the  widgets  are  as  much  a  part  of  the  appli¬ 
cation  as  the  application  itself,  it  is  straightforward  to  mumpulate 
widgets  with  widgets.  In  other  words,  a  user  interface  can  be  built 
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by  starting  with  simple  widgets  and  u  ing  them  to  bootstrap  more 
complex  ones. 

Finally,  we  hope  to  develop  a  high-level  UIDS  (user  interface 
design  system)  [S]  for  oor  system.  As  previously  noted,  our  system 
currently  has  no  toob  for  making  high-level  specifications  of  an 
interface.  Most  commercial  UIMSs,  having  been  built  on  top  of  a 
widget  toolkit,  focus  on  appearance  and  geometry  of  widgets.  Some 
research-level  UIDSs  handle  behavior  and  sequencing.  A  UIDS 
suitable  for  our  system  would  clearly  have  to  be  able  to  handle  full 
application  behavior  and  would  perhaps  be  an  Application  Design 
System,  a  full-fledged  programming  environment  for  3D  interactive 
applications. 
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ABSTRACT 

In  a  virtual  world  viewed  with  a  head-mounted  display,  the  user 
may  wish  to  perform  certain  actions  under  the  control  of  a 
manual  input  device.  The  most  important  of  these  actions  are 
flying  through  the  world,  scaling  the  world,  and  grabbing 
objects.  This  paper  shows  how  these  actions  can  bo  precisely 
specified  with  frame-to-framo  invariants,  and  how  the  code  to 
implement  the  actions  can  bo  derived  from  the  invariants  by 
algebraic  manipulation. 


and  orientation  of  the  hand  during  the  gesture  is  also  used  to 
control  what  happens  as  the  action  progresses. 

The  manual  input  device  may  be  a  hand-held  manipulandum 
with  pushbuttons  on  it,  or  it  may  bo  an  instrumented  glove.  In 
either  case,  the  position  and  orientation  of  the  input  device 
must  be  measur^  by  the  tracker  to  enable  manual  control  of 
actions.  The  input  device  must  also  allow  the  user  to  signal  to 
the  system  to  start  and  stop  actions,  and  to  select  among 
alternative  actions. 


INTRODUCTION 

Wearing  a  Head-Mounted  Display  (HMD)  gives  a  human  user 
the  sensation  of  being  inside  a  three-dimensional,  computer- 
simulated  world.  Because  the  HMD  replaces  the  sights  and 
sounds  of  the  real  world  with  a  computer-generated  virtual 
world,  this  synthesized  world  is  called  virtual  reality. 

The  virtual  world  surrounding  the  user  is  defined  by  a  graphics 
database  called  a  model,  which  gives  the  colors  and  coordinates 
for  each  of  the  polygons  making  up  the  virtual  world.  The 
jxilygons  making  up  the  virtual  world  are  normally  grouped 
into  entities  called  objects,  each  of  which  has  its  own  location 
and  orientation.  The  human  being  wearing  the  HMD  is  called 
the  user,  and  also  has  a  location  and  orientation  within  the 
virtual  world. 

To  turn  die  data  in  die  model  into  the  illusion  of  a  surrounding 
virtual  world,  the  HMD  system  requires  certain  hardware 
components.  The  tracker  measures  the  position  and 
orientation  of  the  user's  head  and  hand.  The  graphics  engine 
generates  the  images  seen  by  the  user,  which  arc  then  displayed 
on  the  HMD.  The  manual  input  device  allows  the  user  to  use 
gestures  of  the  hand  to  cause  things  to  happen  in  the  virtual 
world. 

BASIC  ACTIONS 


Certain  fundamental  manually -controlled  actions  may  be 
implemented  for  any  virtual  world.  These  actions  involve 
changing  the  location,  orientation  or  scale  of  either  an  object 
or  a  user,  as  shown  in  Table  1. 


Translate 


Rotate 


Scale 


Table  1.  Basic  actions 


User  Object 


fly  through 
the  world 

grab  (and  move)  object 

lilt  the  world 

grab  (and  turn)  object 

expand  or  shrink 
the  world 

scale  object 

Flying  is  defined  here  as  an  operation  of  translating  in  the 
direction  pointed  by  the  hand-held  input  device,  with  steering 
done  by  changing  the  hand  orientation.  This  is  different  from 
the  type  of  flying  available  in  a  flight  simulator,  where  the  use' 
can  not  only  translate  but  can  also  cause  the  virtual  world  to 
rotate  around  him  by  banking.  However,  translation-only 
flying  is  appropriate  for  a  HMD  because  the  user  has  the  ability 
to  turn  and  look  in  any  direction,  and  to  point  the  input  device 
in  any  direction.  We  believe  that  keeping  the  orientation  of 
the  virtual  world  locked  to  tliat  of  the  real  world  helps  the  user 
to  navigate  while  flying  through  the  virtual  world. 


An  action  changes  the  state  of  the  virtual  world  or  the  user’s 
viewpoint  within  it  under  control  of  a  gesture  of  the  hand,  as 
measured  by  the  manual  input  device.  The  hand  gesture 
initiates  and  terminates  the  action,  and  the  changing  pwsition 
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title  of  the  publication  and  its  date  appear,  and  notice  is  given 
that  copying  is  by  permission  of  the  Association  for  Computing 
Machinery.  To  copy  otherwise,  or  to  republish,  requires  a  fee 
and/or  specific  permission. 
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Tilting  the  world  is  ihc  ability  to  rc-orient  the  virtual  world 
relative  to  the  user’s  orientation;  that  is,  to  turn  the 
surrounding  virtual  world  sideways.  This  is  implemented  by 
rotating  the  user  with  respect  to  the  virtual  world,  which  is 
subjectively  perceived  by  the  user  as  the  entire  virtual  world 
rotating  around  him. 

Scaling  the  world  is  tlie  capability  to  shrink  or  expand  the 
world  relative  to  the  user,  as  occurs  to  Alice  in  Wonderland 
when  she  drinks  from  the  little  bottle  or  eats  the  little  c.ike.  By 
setting  up  the  action  code  properly,  the  user  can  shrink  and 
c.xpand  the  world  while  maiuially  steering  the  center  of 
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expansion.  This  enables  a  powerful  method  of  travel  in  very 
large  virtual  worlds;  the  user  shrinks  the  world  down  until  the 
destination  is  within  arm's  roach  and  then  expands  the  world, 
continuously  steering  the  center  of  expansion  so  as  to  arrive  at 
the  correetly-scaled  destination. 

Grabbing  an  object  is  picking  up  and  moving  a  simulated 
object  that  appears  in  the  virtual  world.  By  analogy  with  real- 
vorld  grabbing  of  objects,  this  includes  the  ability  to  rotate 
U.e  held  object  before  releasing  it. 

Scaling  an  object  is  just  shrinking  or  expanding  an  individual 
object  alone. 

This  paper  seeks  to  answer  the  following  question;  How  can  the 
basic  actions  of  flying,  grabbing,  scaling  and  tilling  in  a  HMD 
system  be  specified  and  implemented? 

PRIOR  WORK 

The  first  HMD  was  built  in  1968  by  Ivan  Sutherland  (81,  but 
since  it  had  no  manual  input  device  other  than  a  keybioard,  it  did 
not  allow  actions  controlled  by  mantial  gestures.  At  the 
University  of  Utah,  a  tracked  mnnu.al  input  device  called  a 
"wand"  was  added  to  the  system  [9].  The  tip  of  the  wand  was 
tracked  in  jxisition  but  not  orientation.  The  wand  was  used  to 
deform  the  surfaces  of  virtual  objects  composed  of  curved 
patches  (2). 

In  1985  at  NASA  Ames  Research  Center,  McGrctivy  and 
Humphries  built  a  HMD  which  was  later  improved  by  Fisher, 
Robinolt  and  others  (31.  Under  contract  to  NASA,  VPL 
Research  provided  an  instrumented  glove,  later  named  the 
"DataGIove,"  which  served  as  a  manual  input  device.  The 
position  of  the  hand  and  head  were  tracked  with  a  Polhemus 
3Spacc  magnetic  tracker.  In  1986  using  the  glove  input 
device,  Robinett  implemented  on  this  system  the  actions  of 
flying  tlirough  the  world,  scaling  the  world,  rotating  the  world, 
and  grabbing  objects. 

Some  of  these  actions,  particularly  flying  and  grabbing 
objects,  have  since  been  implemented  on  HMD  systems  at 
several  sites.  VPL  Research  began  in  1989  selling 
commercially  a  HMD  system  that  used  a  glove  to  control  the 
actions  of  flying  and  grabbing  (1).  At  the  University  of  North 
Carolina  171(5],  the  actions  of  flying,  scaling  and  grabbing 
were  controlled  with  a  hand-held  manual  input  device  with 
pushbuttons  on  it  which  was  made  from  a  billiard  ball. 

COORDINATE  SYSTEMS  DIAGRAM  FOR  A  HMD 

Various  coordinate  systems  co-exist  within  a  HMD  system.  All 
of  these  coordinate  systems  exist  simultaneously,  .and  although 
over  time  they  may  be  moving  with  res|)cct  to  one  another,  at 
any  given  moment  each  pair  of  them  has  a  relative  position  and 
orientation.  The  instantaneous  relationship  between  two 
coordinate  systems  can  be  described  with  a  transform  that 
converts  tlic  coordinates  of  a  point  described  in  one  coordinate 
system  to  the  coordinates  that  represent  Utat  same  point  in  the 
second  coordinate  system. 

Although  transforms  exist  between  any  pair  of  cnordin.itc 
systems  in  the  HMD  system,  certain  pairs  of  coordinate 
systems  have  relative  positions  that  are  either  constant, 
measured  by  the  tracker,  or  arc  known  for  some  other  reason. 
Those  arc  the  independent  transforms,  which  arc  shown  in 
relation  to  one  anollicr  in  Figure  1.  In  this  diagram,  each  node 
stands  for  a  coordinate  system,  and  each  edge  linking  two 


nodes  stands  for  a  transform  between  those  two  coordinate 
systems. 

modified  when  user  flies,  modified  when 


Figure  1.  Coordinate  systems  diagram  for  a  single-user 
HMD  system 

NOMENCLATURE  FOR  TRANSFORMS 

We  abbreviate  the  coordinate  systems  with  the  first  letters  of 
their  names.  The  World-Object  transform  may  be  written  as 
Two-  Transform  Two  converts  a  point  Pq  in  coordinate  system 
O  to  a  point  Pw  in  coordinate  system  W. 

Pw  =  Two  •  Po 

This  notation  is  similar  to  that  used  in  (4].  Notice  that  the 
subscripts  cancel  nicely,  as  in  (6].  Likewise,  the  composition 
of  the  transform  Two  going  from  O  to  W  with  the  transform 
Trw  going  from  W  to  R  gives  a  transform  Trq  from  O  to  R, 
with  the  cancellation  rule  working  here,  too; 

TrwTwo  =  Tro 

The  inverse  of  Uansform  Two  i®  written  Tow- 

SPECIFYING  ACTIONS  WITH  INVARIANTS 

An  action  in  a  virtual  world  is  performed  by  activating  the 
input  device,  such  as  by  pushing  a  button,  and  then  moving  the 
input  device  to  control  the  action  as  it  progresses.  As  an 
example,  grabbing  a  simulated  object  requires,  for  each  frame 
while  the  grab  action  is  in  progress,  tliat  a  new  position  for  the 
object  be  computed  based  on  the  changing  position  of  the 
user’s  hand. 

It  is  possible  to  precisely  define  grabbing  and  other  actions 
with  an  invariant,  which  is  an  equation  that  describes  the 
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desired  relationship  among  certain  transforms  involved  in  the 
action.  The  invariant  is  typically  stated  as  a  relation  between 
certain  uansforms  in  the  current  display  frame  and  certain 
transforms  in  the  previous  frame.  In  the  case  of  grabbing,  the 
invariant  to  be  maintained  is  that  the  Object-Hand  transform  be 
c<nial  to  its  value  in  die  previous  frame  while  the  grab  action  is 
in  progress:  in  other  words,  that  the  object  remain  fixed  with 
respect  to  the  hand  while  it  is  being  grabbed. 

Starting  from  the  invariant  and  a  diagram  of  the  coordinate 
systems  involved,  a  mathematical  derivation  can  be  perfo  med 
which  produces  a  formula  for  updating  the  proper  transform  to 
cause  the  desired  action  to  occur.  For  grabbing,  this  would  be 
Ujxlating  the  Object-World  transform  to  change  the  object's 
position  and  orientation  in  the  virtual  world. 

Rigorously  deriving  the  update  formula  from  a  simple  invariant 
is  much  easier  and  more  reliable  than  attempting  to  write  down 
the  tipdatc  formula  using  the  coordinate  systems  diagram  and 
informal  reasoning.  Also,  the  matching  of  adjacent  subscripts 
in  the  notation  helps  to  check  that  the  transforms  arc  in  correct 
order. 

GRABBING  AN  OBJECT 

To  derive  the  u|)datc  formula  for  grabbing,  we  first  look  at  the 
relevant  part  of  the  coordinate  system  diagram,  shown  in 
Figure  2. 


held  fixed  modified  by 

during  grab  grab  action 


object 


A  way  of  describing  the  action  of  grabbing  is  that  the  Object- 
Hand  transform  Ton  funiain  unchanged  from  frame  to  frame, 
which  is  expressed  by  the  invariant 

Toil'  =  i'ai 

where  the  aposiropliv  .:i  Tqh'  indicates  a  transform  in  the 
current  frame  which  is  beme  updated,  and  no  ai>ostiophe  ine.ms 
the  value  of  the  transform  from  the  previous  fr.ime 


To  move  an  individual  object,  the  Object-World  transform  Tow 
must  be  updated  each  frame  in  a  way  that  preserves  the 
invariant.  To  derive  the  update  formula  for  grabbing,  we  start 
with  the  invariant  and  decompose  the  transforms  on  botli  sides 
based  on  the  relationships  among  the  coordinate  systems  as 
shown  in  the  coordinate  system  diagram. 

Tf  a  I  •ip  •  ^1^  ^p  ^p 

OW  •  *WR  •  ‘RT  •  *ni  =  'OW  IWR'  ‘RT‘  *111 

We  then  use  algebraic  manipulations  to  isolate  the  desired 
transform  on  the  left  side  of  the  equation,  remembering  that 
these  transforms  arc  not  commutative. 

OW  • ‘WR  •  *RT  -  ‘OW  IWR'  *RT’ ‘Tlf  Urr 

.*...¥«•  T-  -p  .T.  -f.  .pi-l'. 

>OW  •  IWR  =  *OW  lWR‘ ‘RF’ ‘Til’ hrr  • ‘TR 
•ow  =  iow’Twr’ Irt’ I'nt’ Mrr  •  hn  ’Irw 

This  is  the  update  formula  for  grabbing,  which  updates  the 
Object-World  transform  based  on  its  previous  value,  the  current 
and  previous  values  of  the  Hand-Tracker  transform  (which 
changes  as  the  hand  moves),  and  the  values  of  the  intervening 
transforms  between  Tracker  and  World.  The  effect  of  executing 
this  assignment  each  frame  is  to  keep  tlic  object  in  a  fixed 
position  and  orientation  relative  to  the  hand,  even  though  the 
hand  is  moving  around  within  the  virtual  world. 

Another  action  which  can  be  implemented  in  a  similar  manner 
is  “grabbing  the  fabric  of  space.”  In  this  case,  the  user  can 
grab  and  tilt  the  entire  virtual  world,  rather  than  just  a  single 
object,  by  holding  the  World-Hand  transform  invariant  while 
the  hand  rotates. 

FLYING 

The  action  of  flying  is  tritnslaiing  the  user  through  the  virtual 
world  in  the  direction  pointed  by  the  manual  input  device.  The 
user  steers  by  rotating  the  manual  input  device  as  the  flight 
proceeds.  A  metaphor  fi  ’  this  lyjie  of  flying  is  that  the  user 
holds  a  rocket  pistol  in  his  hand,  which  drags  him  through  the 
virtual  world  when  he  s(iuee/.es  the  trigger. 

The  manual  input  device  is  considered  to  |)oint  in  a  particular 
direction  that  is  relative  to  its  local  coordinate  system.  This 
may  be  thought  of  as  a  3D  vector  in  Hand  coordinates,  where 
the  vector’s  length  specifies  the  flying  sjieed  and  the  vector’s 
direction  defines  the  direction  the  input  device  points.  This 
vector  defines  a  translation  transform,  TumnsUicU  .  which 
moves  a  |ioint  in  Hand  coordinates  to  a  new  position  in  Hand 
coordinates.  To  implement  flying,  we  first  need  to  convert  this 
transformation  to  oiieraie  on  points  in  Room  coordinates. 

TruuuUicR’  =■  Tgll’ ■Tmrjn,ij,^n’ -Tiir’ 

To  make  the  user’s  (xisition  change  within  the  virtual  world, 
the  World-Roont  transform  must  be  modified  each  frame,  so  tlie 
invariant  for  flying  is 

Twr’  =TwR  ’TRujnjiaieR' 

which  may  be  expanded  to  give  the  update  formula  for  flying. 

Twr’  =  Twr  •  Tri’  •  Tm’  •  T|[irj[.,!3.,;ii  •  Tin  ’ '  Ti r’ 

SCALING  THE  WORLD 

It  is  possible  to  shrink  or  expand  the  surrounding  virtual  world. 
This  IS  comprehensible  and  effective  because  the  user  has  direct 
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perception  of  the  size  of  and  distance  to  virtual  objects  through 
stercopsis  and  head-motion  parallax,  and  can  thcrcfoic  easily 
perceive  the  concerted  motions  of  the  objects  in  the  virtual 
world  expanding  around  a  center  of  expansion,  or  shrinking 
towards  a  center  of  contraction. 

The  type  of  scaling  used  is  uniform  scaling,  in  which  all  three 
dimensions  arc  always  sealed  by  the  same  factor.  Tlicrc  is 
always  a  center  of  scaling  when  uniform  scaling  occurs,  and  for 
the  manually  controlled  action  of  scaling  the  world,  it  makes 
.sense  to  locate  the  center  of  scaling  at  the  user's  hand.  Wlien 
expanding  the  world,  the  center  of  scaling  is  the  point  that 
virtual  objects  move  away  from  as  expansion  occurs,  and  so  to 
end  up  at  a  specific  desired  location  within  a  formerly-tiny 
virtual  world,  the  center  of  scaling  must  be  repeatedly  rc- 
ccntcrcd  on  the  desired  location  as  it  emerges  during 
expansion. 

Implementing  this  itetion  requires  a  derivation  similar  to  that 
used  for  flying.  An  incrcmcnt.il  scaling  transformation  in  Hand 
coordinates, Til, c,i„ii.  origin  as  the  center  of 

scaling.  Below  we  give  the  inv.iriant  for  .scaling  the  world,  and 
the  u|xlale  formula  derived  from  it. 

TwK’=TwR'TK,altR’ 

WR  =  'WR-  IRI'  •  I  ni  •  MUnklf  Mrf  •  IfR 

GENERAL  FORM 

U|X)n  examining  the  invariants  for  flying  and  scaling,  we  see  a 
strong  similarity  between  them;  both  invariants  arc  of  die 
form: 

Twr'  =  T\vr  •  TK<,ra,ufnnn>R' 

In  fact,  these  two  invariants  for  uixlaiing  arc  examples  of  a 
more  general  technique  for  uixl.iting  a  transform  betsseen  two 
coordinate  systems  based  on  a  transform  that  occurs  in  a  third 
coordinate  system.  The  general  form  for  U|xlating  the 
transform  T^n  in  terms  of  an  action  in  coordinate  system  K  is: 

1  Atl  “  ^Atl  ■  Kti 

w  here  there  may  be  an  arbitrary  numlx;r  of  coordinate  systems 
lx;iween  B  and  K,  ;uul  Tun  Pf'>duct  of  the  transforms  that 
go  twtwecn  the  tw'o  cooidinate  systems. 

Using  this  general  form,  scaling  an  object  atxnii  the  hand  is 
analogous  to  scaling  the  world  alxnit  the  hand; 

Tow’  =  T()\v  ■  Twit’  ■  'I'ltvcjtiii  •  nw' 

CONCLUSIONS 

The  foregoing  examples  of  grabbing,  flying  and  scaling  show 
how  actions  can  be  implemented  that  ojwraie  under  continuous 
manual  control  by  the  user.  1  or  each  action,  the  relationship 
between  the  motion  of  the  hand  and  die  transforms  to  be 
modified  was  precisely  specified  with  an  invariant.  These 
invariams  not  only  provided  a  concise  and  precise 
s|>e. ification  of  e.uh  action,  but  also  provided  a  starting  |K>mt 
for  a  formal  dcris.ttiun  that  produced  upd.ite  equations  which 
could  lie  used  directly  to  implement  tlie  actions. 

Using  invariants  and  derivations  to  produce  the  code  to 
im|)!ement  grabbing,  scaling  and  flying  is  greatly  superior  to 
tl'.e  rnc'tliod  which  is  often  used,  namely,  to  jiist  write  down  a 


sequence  of  transforms  that  looks  right  based  on  the  coordinate 
system  diagram.  It  is  easy  to  get  some  of  the  transforms  in  the 
wrong  order.  The  notation  used  in  this  paper  provides  a  check 
against  misordering  the  transforms  by  requiring  adjacent 
subscripts  to  match.  The  HMD  software  at  UNC  was 
implemented  using  this  notation  and  the  formulas  derived  in 
this  paper,  and  serves  as  proof  that  they  work. 
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ABSTRACT 

A  controlled  experiment  was  conducted  to  compare  head-tracked 
and  non  head-tracked  steering  modes  in  the  performance  of  an 
abstract  oeam-targeting  task.  Collected  data  revealed  a  wide  variety 
of  mode;  preferences  among  the  subjects.  Subject  performance,  as 
measured  by  fmal  score,  task  completion  time  and  subject  conH- 
dence.  differed  very  little  between  the  head-tracked  steering  modes 
taken  as  a  group  and  the  ;ollective  non-head-tracked  modes.  Some 
significant  differences  were  observed  between  individual  steering 
modes,  both  within  and  between  the  head-tracked  and  non-head- 
tracked  groups. 

INTRODUCTION 

Cunent  research  at  the  University  of  North  Carolina  at  Chapel  Hill 
is  investigating  the  possible  benefits  to  be  gained  by  applying  head- 
mounted  display  (HMD)  technology  to  radiotherapy  treatment  plan¬ 
ning  (RTF).  Use  of  ahead-mounted  display  for  targeting  of  treatment 
beams  suggests  several  possible  steering  modes  for  exploring  the 
virtual  world  of  the  patient's  anatomy.  To  determine  which  steering 
mode  is  best  suited  to  our  application,  a  user  study  was  conducted  to 
investigate  the  relative  merits  of  the  different  steering  modes. 

Seven  steering  modes  were  used  in  an  abstract  beam-targeting  task. 
Four  modes  used  head-tracking  information,  whil ;  the  other  three 
modes  did  not.  It  was  anticipated  that  head-tracking  would  provide 
an  advantage  in  beam  targeting  through  more  natural  steering  and 
navigation  that  makes  use  of  proprioceptive  and  vestibular  informa¬ 
tion,  which  are  absent  in  non-head-tracked  methods. 

Related  work  in  movement  through  a  virtual  world  is  described  in  [  1 , 
2, 3],  but  these  studies  do  not  deal  with  HMD's  and  heail-tracking. 

BEAM  TARGETING 

The  key  to  successful  beam  targeting  in  radiation  therapy  ireauneni 
planning  is  to  orient  and  shape  the  beams  so  that  the  entire  tumor  is 
covered  by  each  beam  while  as  little  of  the  healthy  siurounding  tissiv; 
as  possible  is  hit  by  the  beams.  Given  tlie  complex  spatial  arrange¬ 
ment  of  a  patient's  anatomy  (tumors  may  be  draped  around  healthy 
organs  or  have  tendrils  snaking  out  into  the  hedthy  tissue),  this  is 
usually  not  an  easy  task. 

To  evaluate  the  different  steering  modes,  subjects  were  presented 
with  an  abstract  anatomy  model,  consisting  of  a  multi  colored 
spherical  target  (tumor  f Jidog)  embedded  in  acollection  of  uniquely 
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colored  monochromatic  spherical  dodges  (organ  analogs).  (See 
Color  Plate  1 .)  The  subjects  used  each  of  the  seven  steering  modes 
to  manipuhte  the  direction  in  which  a  conical  virtual  beam  passed 
through  the  model.  The  beam  was  defmed  such  that  its  source  (cone 
vertex)  was  always  a  fixed  distance  from  the  target,  its  central  ray 
(cone  axis)  passed  throught  the  center  of  the  target,  and  i>.s  divergence 
was  just  large  enough  to  encompass  the  tai  gct.  TiC  subject  was 
instructed  to  find  the  beam  direction  that  afforded  the  smallest 
volume  of  intersection  between  tlie  conical  beam  and  the  dodges,  a 
task  analogous  to  a  radiotherapist  trying  to  avoid  radiosensitive 
organs  with  a  treatment  beam. 

STEERING  MODES 

The  term  “steering  mode"  refers  to  the  method  used  to  change  one's 
position  or  orientation  in  the  virtual  world.  This  is  distinguished 
from  navigation,  which  refers  to  understanding  one’s  current  posi¬ 
tion  and  orientation  relative  to  other  objects  in  the  virtual  world. 

Head-Tracked 

These  modes  are  linked  to  movement  of  the  subject's  head  and  enable 
the  subject  to  m'ke  use  of  ve  „  ‘bular  (inner  ear  balance)  and  proprio¬ 
ceptive  (muscles,  tendons,. oints)  senses  for  navigation. 

Walkaround  (WLK).  In  Walkaround  mede,  the  subject  physically 
walks  about  in  the  virtual  world  containing  the  target/dodges  model. 
Ihe  direction  of  the  beam  is  defmed  by  the  vector  from  the  subject's 
eyes  to  the  center  of  the  target.  To  better  examine  the  model  and 
target  the  be  am  from  above  and  below,  the  subject  is  given  the  ability 
to  vertically  translate  the  model  using  a  6-D  mouse.  No  other 
manipulation  of  the  model  is  possible. 

Walkaround/Rota^on(WKR).  Tliis  is  tlie  same  as  Walkaround  mode, 
wi  Ji  the  exception  tltat  the  subject  is  able  to  also  rotate  the  model 
about  any  axis  in  3-space  tlirough  its  center  by  grabbing  witli  tlie  6 
D  mouse.  (See  6-D  Mouse  section  l>elow.) 

Orbital  (ORD).  In  Orbital  mode  the  subject  is  constrained  to  always 
be  looking  at  the  center  of  the  model  from  tlie  beam  source.  Ream 
direction  coincides  with  gaze  direction.  Unlike  die  Walkaround 
modes.  Orbital  mode  uses  only  head  orientation  and  ignores  head 
position.  As  the  subject’s  head  turns,  the  model  is  observed  to 
translate  about  the  subject’s  head  at  a  constant  die tance,  (Hence  die 
name  Orbital.)  Because  the  model  undergoes  no  rotation,  i'  can  be 
viewed  from  any  direction  w  ith  a  turn  of  liie  subject's  head 

Immersion  (IMM).  Ir  Immersion  mode  the  subject  views  the  model 
looking  outward  from  the  center  of  die  target.  Like  Orbital  mode, 
Itiuiieision  mode  makes  use  of  head  orientation  only  and  ignores 
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head  position.  When  the  subject’s  head  turns,  the  subject’s  view 
sweeps  across  portions  of  the  model  from  its  fixed,  central  vantage 
point.  The  beam  direction  is  defined  by  the  subject’s  gaze  direction, 
and  the  task  of  finding  the  best  beam  orientation  becomes  one  of 
looking  for  the  portion  of  the  model  with  the  biggest  opening.  Since 
the  beam  passes  completely  through  the  model,  the  subject  is  given 
the  ability  to  reverse  his  gaze  direction  by  holding  down  a  button  on 
the  6-D  mouse,  and  can  thereby  examine  the  complete  prospective 
beam  path  through  th.e  model. 

Non-Head-Tracked 

Although  thesemodesdonotmakeuseofhead-tracking  information, 
the  subjects  still  viewed  the  model  through  the  HMD  so  that  image 
quality  was  equalized  over  the  seven  modes.  These  th  ee  modes  all 
place  the  subject’s  eye  at  the  beam  source,  looking  in  me  direction  of 
the  beam  toward  the  target,  and  support  exploration  of  prospective 
beam  orieatations  by  rotating  the  model  in  three-space. 

Joystick  (JOY).  In  Joystick  mode  the  model  is  rotated  with  a  velocity- 
control  joystick.  In  addition  to  the  left-right/forward-backward 
movement  the  joystick,  the  cap  of  the  joystick  turns  clockwise  and 
counterclockwise  to  provide  all  three  degrees  of  rotational  freedom. 

iJpacBball(SPC).  In  this  mode  the  model  is  rotated  with  a  Spaccball*, 
an  isometric,  force-sensitive  device  that  provides  six  degrees  of 
translational  and  rotational  freedom.  This  mode,  however,  uses  only 
the  three  rotational  •  .grees  of  freedom  as  a  velocity  control  for 
rotation  of  the  model  in  three-space. 

6  D  Mouse  (SDM).  In  6-D  Mouse  mode  tliv.  orientation  of  the  model 
is  control!^  with  a  custom-built,  six  degree-of-frecdom  mouse 
(tracker  sensor  embedded  in  a  pool  ball  with  two  buttons).  When 
either  mouse  button  is  held  down,  the  rotational  component  of  the 
mouse  movement  is  directly  linked  to  model  rotation,  and  the  subject 
sees  the  model  rotate  in  the  same  manner  as  his  hand. 

BEAM'S-EYEVIEW 

An  important  feature  of  a  steering  mode  that  may  affect  a  subject’s 
performance  is  whether  or  not  it  provides  a  "bcam’s-eye  view.” 
Beam's-eyeviewistheviewseenbyaneycnincidentwiththebeam 
vertex  and  whose  gaze  vector  coincides  with  the  beam’scentral  axis. 
With  a  beam’s-eyc  view  it  is  very  easy  to  determine  which  dodges 
intersect  tiic  beam,  for  since  the  beam  is  defined  to  diverge  just 
enough  to  exactly  enclose  the  target,  the  silhouettes  of  those  dodges 
will  overlap  with  the  silhouette  of  the  target.  In  those  modes  that  do 
not  provide  a  beam’s-eye  view,  it  is  more  difficult  for  the  suoiecl  to 
judge  which  dodges .««  h. jt  by  the  beam. 

Walkaround  and  Walkaround/Rotate  modes  do  not  provide  beam’s- 
eye  views,  because  the  subject’s  head  cannot  be  physically  con¬ 
strained  to  align  with  the  beam  source.  Immersion  mode  also  does 
not  provide  a  beam’s-eye  view,  since  the  subject’s  eyepoint  is  con¬ 
strained  to  stay  at  the  target’s  center.  The  other  four  modes  do 
provide  beam’s-eyc  views. 

EXPERIMENTAI  METHOD 

The  expel  intent  was  a  one-factor  within-subject  investigation,  with 
steering  mode  as  the  independent  variable.  Dependent  variables 
measured  were  final  score  (volume  of  intci’section  between  beam  and 
dodges),  task  completion  time.  com'iJence  in  tlie  final  beam  configu¬ 
ration,  and  rank  orderings  of  the  seven  modes  by  ease  of  use  and  by 
preference. 

Fourteen  subjects  were  recruited  from  graduate  students  and  staff 
members  of  ^e  Departments  of  Computer  Science,  Radiation  On- 
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cology.  and  Radiology  at  UNC.  Each  subject  underwent?  sessions, 
caCn  of  which  used  a  different  steering  mode.  The  order  of  the 
steering  modes  used  by  each  subject  was  varied  according  to  a  7x7 
latin  square.  Each  session  consisted  of  3  practice  trials  followed  by 
3  test  trials.  Each  trial  used  a  unique  target/dodge  model. 

breach  trial  the  subject  explored  prospective  beam  orientations  until 
the  best  one  was  found,  at  which  time  ths  subject  stopped  the  trial. 
There  was  no  time  limit,  nor  any  emphasis  on  task  completion  time — 
the  subject  was  instructed  to  take  as  long  as  n-ces'ary  to  find  the  best 
beam  path.  A  virtual  marker  (arrow  pointing  uirough  the  model)  was 
provided  to  the  subject  to  use  as  a  reference.  At  any  time  the  subject 
could  issue  a  “mark”  command,  which  aligned  the  marker  with  the 
current  beam  direction,  and  the  marker  would  remain  fixed  in  the 
model  until  a  subsequent  comn'  tnd  was  issued.  The  score  and  task 
completion  time  for  the  trial  were  recorded,  as  well  as  the  subject’s 
rating  or.  a  scale  of  1  (no  confidence)- 10  (total  confidence)  of  how 
confident  he  or  she  was  that  the  best  beam  orientation  had  been 
found. 


After  all  sevensessions  were  completed,  the  subjectranked  the  seven 
steering  modes  according  to  two  criteria,  ease-of-use  of  the  steering 
mode  and  preference  for  performing  the  beam  targeting  task. 

Equipment  used  included  a  Polhemus  3Space*  tracker  on  an 
EyePhone*  Model  2  head-mounted  display,  displaying  images  gen¬ 
erated  by  UNO’s  Pixel-Planes  4  graphics  processor. 

RESULTS 

Figure  1  presents  histograms  showing  for  each  steering  mode,  the 
number  of  times  it  was  ranked  1st,  2nd, ...  7th  by  ease-of-use  and  by 
preference.  The  plot  for  Walkaround  Mode  shows  that  most  subjects 
found  it  to  be  one  of  ti,  -  more  difficult  steering  modes  to  use.  The 
other  three  head-tracking  modes  have  somewhat  flat  histograms, 
suggesting  no  general  consensus  on  how  easy  they  were  )o  use.  Of 
the  non-head-tracking  modes,  the  Joystick  mode  is  widely  consid¬ 
ered  an  easy-to-use  steering  mode.  Spaceball  mode  and  6-D  Mouse 
mode  both  tended  to  be  on  the  difficult  side. 


The  preference  rankings  show  that  Joystick  mode  was  widely  pre- 
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ferred  by  the  subjects,  whereas  Walkaround  Mode  was  widely 
disliked.  For  Spaceball  and  6-D  Mouse  modes  subjects'  opinions 
were  on  the  less-preferred  sides.  On  thcolherhand,  Walk/Rotate  and 
Orbital  modes  were  on  the  more-preferred  side  of  the  scale.  Immer¬ 
sionmode  shows  an  interesting  bimodal  distribution,  suggesting  that 
subjects  either  loved  it  or  hated  it. 

In  order  to  factor  out  inter-model  variability,  each  trial's  score  was 
notTiialized  by  the  median  score  across  subjects  for  the  particular 
model  used  in  that  trial.  Figure  2  shows  the  distribution  of  the 
logarithms  of  the  normalized  scores  grouped  by  steering  mode.  As 
the  best  possible  score  is  0  (no  intersection  between  beam  and 
dodges),  the  more  negative  values  represent  better  performance. 
Student's  t-test  reveals  significant  differences  for  the  following 
inter-mode  comparisons:  IMM-ORB  (0=0.0007),  IMM-SPC 
(oi=0.0069),SDM-ORB(a=0.0126),WKR-ORB(a=0.0187),IMM- 
JOY  (0=0.0197).  Head-tracked  modes  (IMM,  ORB.  WKR.  WLK) 
taken  as  a  group  do  not  differ  significantly  from  non-head-tracked 
modes  (JOY.  SDM.SPC). 

Figure  3  shows  the  distribution  of  the  logarithms  of  the  task  comple¬ 
tion  times  grouped  by  steering  mode.  No  significant  differences  are 
found  in  thb  data,  neither  between  individual  steering  modes  nor 
between  head-tracked  and  non-head-tracked  modes. 

Figure  4  presents  the  distribution  of  subject's  confidence  rating 
grouped  by  steering  mode.  The  only  significant  effect  found  in  this 
data  is  the  JOY-IMM  comparison  (a=0.0375). 

Table  1  describes  comdations  between  the  dependent  variables.  Not 
surprisingly,  ease-of-use  a.id  preference  rank  are  liighly  correlated. 
Significant  correlations  are  also  found  between  subject  confidence 
and  ease-of-use,  preference,  and  elapsed  time.  All  three  correlations 
are  negative,  indicating  that  a  subjects'  confidence  decreased  when 
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using  difficult  steering  modes  or  modes  they  did  not  like,  or  when 
trials  took  a  long  time.  Interestingly,  score  is  not  significantly 
correlated  with  any  of  the  other  variables. 

DISCUSSION 
Trial  Replay 

In  addition  to  the  statistical  siunmarics  presented  above,  a  subjective 
review  of  each  trial  was  conducted  by  playing  back  log  files  in  which 
were  recorded  status  information  for  the  subject's  head,  the  model 
and  the  beam  at  half-second  intervals.  By  observing  the  trial  replay 
with  the  HMD,  it  was  possible  to  smdy  how  the  subject  moved  and 
how  the  model  was  manipulated.  The  trial  replay  also  traced  the 
location  of  the  beam  source  through  the  trial,  quickly  revealing  which 
beam  directions  were  considered,  and  perhaps  more  important, 
which  directions  were  not  considered. 

In  spite  of  being  instructed  to  find  the  best  possible  beam  direction, 
subjects  usually  terminated  the  trial  before  considering  all  possibili¬ 
ties.  Presumably  they  were  able  to  attain  a  good  enough  spatial 
understanding  of  the  model  without  having  to  inspect  it  from  all 
angles.  Trials  in  which  the  model  had  been  completely  covered 
usually  were  usually  of  extremely  long  duration,  with  subject  move¬ 
ment  suggesting  confusion  and  disorientation.  In  only  a  few  cases 
did  subjects  follow  a  systematic  search  strategy,  and  these  systematic 
searches  would  usually  be  abandoned  after  one  candidate  beam 
direction  had  been  found.  For  the  most  part,  subjects  followed  what 
might  be  called  a"greedy  "  steering  strategy ,  moving  about  the  model 
in  a  maimer  based  upon  their  current  view  of  the  model,  and  not  upon 
some  predefined  plan.Asa  result, inmostlrials  the  traces  of  the  be  am 
source  showed  large  “holes"  that  were  never  considered.  From  just 
watching  the  trial  replay  it  is  difficult  to  determine  whemer  such 
holes  were  areas  that  were  deliberately  skipped  or  accidentally 
missed.  Some  of  these  areas  corresponded  to  ^am  directions  that 
were  obviously  ►•ad,  which  might  imply  that  those  possibilities  were 
deliberately  skipped.  Otlier  holes  contained  prospective  beam  direc¬ 
tions  that  were  good  enough  to  deserve  consideration,  implying  that 
ihesearcaswereaecidentally  missed  by  die  subject.  In  most  cases  the 
beam  directions  that  required  the  subject  to  look  straight  up  or 
straight  dow  n  were  not  covered,  as  the  HMD  would  exert  very  large 
torques  on  the  subject’s  neck  in  these  positions. 

Most  subjects  relied  very  heavily  on  die  marker  to  provide  a  refer¬ 
ence  point  in  die  model  Tlie  marker  served  as  a  landmark  that 
facilitated  quick  movement  between  two  diametrically  opposed 
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head-lracked  modes. 


beani  directions,  and  was  also  typically  used  as  a  “best-beani- 
direction-so-far”  marker  to  which  the  subject  would  return  for  the 
final  solution  after  further  exploration  elsewhere.  Many  subjects  ex¬ 
pressed  a  desire  to  have  more  than  one  marker.  Most  subjects  did  not 
make  use  of  tlte  context  provided  by  dodges  uniquely  colored  in  H  I.ii 
space  for  reference.  Only  one  subject,  whose  own  research  is  con¬ 
cerned  with  the  use  of  color,  found  the  colors  useful-  —so  useful,  in 
fact,  that  the  markers  were  never  used. 

Steering  Mode  Summaries 

Immersion  (IMM).  Immersion  mode  produced  significantly  worse 
scores  than  Orbital,  Spaceball,  and  Joystick  modes,  and  it  appeared 
to  have  instilled  less  confidence  in  the  subjects  than  the  other  modes. 
This  may  be  a  result  of  the  subjects’  being  able  to  see  only  a  small 
portion  of  the  model  at  any  time,  which,  combined  with  the  lack  of 
any  head-motion  parallax,  could  have  hindered  the  subject’s  devel¬ 
opment  of  a  complete  mental  picture  of  the  model.  In  addition, 
subjects  were  required  to  evaluate  prospeedve  beam  orientations  by 
looking  in  one  direction  and  then  in  the  other  direction,  with  no  clear 
indication  of  where  the  boundary  of  the  beam  was.  Immersion  mode 
did,  however,  have  the  advantage  of  providing  the  ability  for  the 
subject  to  use  muscle  memoiy  in  navigation.  Even  without  a 
complete  global  understanding  of  the  model,  subjects  knew  how  they 
had  to  orient  their  heads  to  get  back  to  a  particular  beam  direction. 

Orbital  (ORB).  Despite  the  fact  that  there  is  no  real-world  metaphor 
for  this  steering  m^e.  Orbital  mode  produced  significantly  belter 
scores  than  Immersion,  6-D  Mouse,  and  Walk/Rotate  modes.  This 
may  have  been  due  to  the  unique  combination  of  several  factors. 
Orbital  mode  provide.^  a  beam’s-eye  view  of  the  model,  which  at 
once  t,.  <es  the  subject  an  external  global  view  of  the  model  and 
allows  the  subject  to  easily  determine  which  dodges  intersected  the 
beam.  Another  contributing  factor  is  the  aid  to  navigation  through 
muscle  memory  provided  by  Orbital  mode. 

Walkaround  (WLK),  Walkaround  mode  produced  the  longest  mean 
task  completion  time,  but  was  undistinguished  in  score  and  subject 
confidence.  The  long  task  completion  lime  is  not  surprising,  given 
the  difficulty  of  walking  about  in  the  virtual  world  in  a  HMD  that 
seals  off  any  view  of  the  real  world.  Most  subjects  found  this  mode 
very  awkward  and  lime  consuming,  and  ranked  Walkaround  low  in 
ease  of  use  and  preference.  Interestingly,  this  mode  more  than  any 
other  was  used  for  systematic  searches.  One  subject  repeatedly 
circled  around  the  mt^el,  inspecting  the  model  at  different  heights 
with  each  loop.  Another  subject  opted  to  walk  less  and  inspect  the 
model  vertically  at  regular  intervals  around  the  model.  Perhaps  the 
awkwardness  of  the  mode  instilled  in  these  subjects  a  need  for  a 
disciplined,  efficient  approach. 

Walkaround/Rotatlon  (WKR).  Walk/Rotaie  mo-  le  did  not  perform 
any  better  than  Walkaround  mode,  but  fared  better  in  ease  and 
preference  rankings.  The  model  rotation  capability  was  used  to 
different  degrees  by  the  different  subjects.  Most  subjects  walked 
very  little  and  spent  most  of  ilieir  lime  standing  still  and  rotating  the 
model  as  in  6  D  Mouse  mode.  Some  trials  showed  no  rotation  at  all, 
perhaps  indicating  a  reluctance  in  the  subject  to  lose  the  navigational 
advantage  provided  by  a  fixed  model  reference  frame. 

Joystick  (JOY).  Joystick  mode  ranked  very  high  in  ease-of-use  and 
preference,  probably  because  most  of  the  subjects  worked  with 
computers  and  were  somew  hat  familiar  w  ith  v  ideo  games.  Even  so, 
performance  with  Joystick  mode  was  not  notable.  Trial  replay 
revealed  that  most  subjects  used  only  principal  axis  rotations,  i.e. 
they  rotated  models  mostly  vertically  and  horizontally  and  very  little 
diagonally.  This  was  probably  due  to  the  mechanical  action  of  tlie 
joystick,  which  required  slightly  more  effort  to  move  diagonally. 
The  effect  of  this  restriction  is  unclear,  for  while  it  forced  subjects  to 
decompose  their  movements  into  a  scries  of  principal  axes  rotations, 
it  provided  aprecision  of  movement  not  available  w  ith  tlte  otlier  non- 


6-D  Mouse  (SDM).  Compared  to  subjects’  preference  for  Joystick 
mtxle  and  dislike  for  Walkaround  mode,  response  to  6-D  Mouse 
mode  was  relatively  flat.  Its  performance  was  undistinguished  from 
the  oiheri.'odes.  Trial  replays  showed  that  this  mode  suffered  greatly 
from  tracker  latency,  which  greatly  hindered  both  precise  alignment 
and  movements  large  enough  to  require  more  than  one  grab-release 
cycle.  Consequently,  beam  source  traces  for  6-D  Mouse  mode  were 
characterized  by  a  very  jagged  appearance  with  large  direction 
changes  separating  relatively  small  rotations. 

Spaceball  (SPC).  The  performance  of  Spaceball  mode  is  relatively 
undistinguished,  but  its  preference  rankings  are  weighted  toward  the 
low  end.  Many  subjects  found  the  Spaceball  fatiguing  and  difficult 
to  use  for  precise  movements. 

General  Comments 

Perhaps  most  compelling  is  the  large  inter-subject  variance  seen  this 
experiment,  which  may  have  masked  significant  differences  be¬ 
tween  steering  modes  and  between  the  collective  head-tracked 
modes  and  the  non-head-tracked  modes.  Standardized  tests  of 
sp,itial  orientation  and  spatial  visualization  [4]  may  provide  a  nor- 
mali.zing  factor  to  reduce  this  variance. 

Another  interesting  observadon  is  the  large  variation  seen  in  the  pref¬ 
erence  and  ease-of-use  histograms  of  the  head-tracked  modes.  There 
was  no  general  consensus  about  which  of  the  four  modes  was  the 
best,  although  Walkaround  wasgemially  considered  the  worst.  This 
suggests  that  to  be  widely  accepted,  .m  HMD-based  targeting  tool 
should  have  an  adaptable  user  inteiface  that  lets  users  choose  the 
steering  mode  they  want  to  use.  One  must  also  consider,  however, 
that  a  task  so  critical  as  targedug  of  treatment  beams  demands 
optimal  performance.  Orbital  rriode  provides  better  performance 
than  the  other  three,  and  will  be  cfuried  over  into  the  next  experiment, 
involving  true,  anatomical  beam-targeling.  Since  score  is  not  corre¬ 
lated  with  mode  preference,  it  iv  expected  that  pcrfornance  of  users 
who  do  not  like  Orbital  mode  vill  not  suffer  from  having  to  use  it. 

CONCLUSIONS 

Collected  data  show  no  sigi  ificanidifferencebe.  head-lracked 

steeringmodesandnon-hf.ad  iractted  steering  mode,  intheperforn* 
ance  of  an  abstract  beam  taigeting  task.  Orbital  me  le  provided  the 
besloverallperformanc<..lmmersi.mmodethewoi  Thethreenon- 
head-tracked  modes  wi.icnot  distinguished  bj  pi.r.ormance. 

ACKNOWLEDGEt.'iF.NTS 

Support  for  this  reicarch  was  icceiv.?d  from: 

Defense  Advanced  Research  Projects  ^gency ,  Cr-siiract  No.  DAEA 
18-90-C-0(>44. 

Digital  Equipm  -■nt  Cerp'.iraiion,  Research  Agiect.ent  No.  582. 
National  Science  f-’o'uul.ui.m,  Gram  No.  CD.^-8'7 12752. 

Office  of  Naval  Ri'..-.ed'-  h,  Grant  No.  N<J0014-eir'.-K-0680. 

REFERENCES 

1.  Ware, C. and  J»tKme,S,,f.x,/!e.diio;.an'_vluualcamciacontrol 
in  virtual  ’.mec  dimensional  cmirour  ..rts.  In  Proc.  1990 
Syniposiutii  on  Inleraetive  3D  Grapl.it .  (Snowbird,  UT,  Mar. 
’9l).Cc(ni)mcrGraphics,24.2(Mar.  1991),  175-183. 

2.  Ware,  f‘,  »:.J  Slipp,  L.,  Exploring  vLiual  environments  using 
v.lotiij  .onirul.  A  comparison  of  thr  e  devices.  In  Proc. Hum 
Facio’s  i,',  e.,  35ih  Amt.  Mtg.  (San  fiancisco,  Sep.  ’91).  HFS, 
1991, !)().  300-304. 

3.  Mackii  Jay,  J.D., Card,  S.K.,  and  Robertson, G.G.,Rapidcontrollcd 
movement  lluough  a  virtual  3D  workspace.  Computer  Graphics, 
24.4(Aug.  1990),  171-176. 

4.  McGee,  MG.,  Human  Spatial  .'Kbillties.  Ihaeger,  New  York, 
1979. 


Interactive  Manipulation  and  Display  of 
Two-Dimensional  Surfaces  in  Four-Dimensional  Space 


David  Banks 

Department  of  Computer  Science 
University  of  North  Carolina  at  Chapel  Hill 


Abstract 

Surfaces  in  4-space  generally  produce  self-intersections  when 
projected  to  3-space.  The  geometry  of  the  projected  surface 
ch.anges  as  the  surface  rotates  rigidly  in  4-space.  This  paper 
presents  tecltniques  for  interacting  with  such  a  surface,  for 
recovering  the  geometry  and  depth  information  that  the 
projection  destroys,  for  computing  the  intersections  and  the 
surface  when  projected  to  3-space,  and  for  cor  puting  the 
silltoi'cttes  and  the  surface  when  projected  to  the  '.creen.  These 
techniques  are  part  of  an  interactive  system  called  Fourphront, 
which  uses  Pixel-Planes  5  as  the  graphics  engine. 

1  Introduction 

Versatile  high-performance  graphics  machines  let  us 
interactively  manipulate  surfaces  in  four  dimensions.  The 
projective  geometry  and  linear  algebra  required  for  the  job  are 
well  known  (Semple],  but  surfaces  in  4  space  present  challenges 
in  designing  a  user  interface  and  a  set  of  visualization  cues.  This 
paper  presents  techniques  to  address  these  problems,  using 
Pixel-Planes  S  as  the  graphics  platform.  In  particular,  we  present 
techniques  for  gathering  3D  input  to  manipulate  a  surface  in  4- 
space,  for  providing  visualization  cues,  and  for  applying  4D 
depth  cues.  These  techniques  arc  at  the  heart  of  an  interactive 
system  called  “Fourphront.” 

Why  study  surfaces  in  4-space?  One  reason  is  that  topologists 
have  yet  to  classify  all  the  3-dimensional  compact  surfaces,  but 
have  succeeded  with  the  2-dimensional  surfaces  (k-holed 
donuts  and  their  non-orientable  counterparts).  Many  of  the  2- 
dimensional  surfaces  require  four  dimensions  in  which  to 
imbed,  and  none  of  the  compact  3-dimensional  surfaces  can 
imbed  in  three  dimensions  of  Euclidean  space.  It  might  be 
enlightening  to  examine  and  compare  surfaces  that  are 
topologically  equivalent  and  that  inhabit  four  dimensions  of 
space.  Do  tliey  look  alike  or  not? 
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It  is  difficult  even  to  illustrate  the  3D  classification  problem 
with  genuine  examples;  these  are  volumes  without  boundaries, 
residing  in  up  to  seven  dimensions  of  space.  Even  the  2- 
dimensional  surfaces  may  require  four  dimensions  for  their 
imbedding.  Interactive  computer  graphics  can  be  of  service  by 
providing  a  window  on  these  surfaces  in  4-space. 


Figure  I,  A  user  in  3-spoce  monipuiales  a  surface  in  4-spaee,  which 
projects  to  i-spac  and  then  onto  the  screen. 


The  three  steps  of  our  task  (figure  1)  are  (§2)  mapping  input 
from  user  space  to  object  space,  (§3  and  §4)  projecting  from 
object  space  to  illumination  space,  and  (§5)  projecting  from 
illu.mination  space  to  the  screen. 

2  Mapping  User  Input  to  World 
Transformations 

The  illusion  of  reality  is  strongest  when  the  user  controls  what 
scene  it  is  that  he  views.  Dynamic  control  of  the  transfonnaiion 
matrices  requires  an  injmt  device  that  offers  a  natural  means  for 
producing  the  object’s  motion.  There  are  ten  degrees  of 
freedom  that  we  wish  to  control  for  manipulating  objects  in  4- 
space:  four  extents  of  translation  m  the  axial  directions  (x,  y,  z, 
w).  and  SIX  Euler  angles  of  rotation  within  the  axial  planes  (.\\v, 
yw,  zw.  xz,  yz,  xy).  The  4D  rotations  look  very  much  like  their 
3D  counterparts,  although  it  becomes  more  appropriate  to 
think  of  rotations  occurring  within  a  plane  rather  than 
occurring  about  an  axis  (figure  2).  In  3-spacc,  rotations  leave  a 
I -dimensional  subspace  fixed;  that  subsjrace  is  the  rotation  axis. 
In  4D.  rotations  leave  a  2  dimensioniil  subspace  fixed,  while 
permuting  the  points  w'ithin  the  2-dimensional  rotation  plane 
and  within  the  bundle  of  planes  parallel  to  it  In  J'-eral,  the 
rotation  inauix  for  the  x,x^  pl.uie  (i  <  j),  contains  the  elements 
=  cos  t,  a,,  =  -Uj,  =  (  1)^".'^  stn  i.  ,ind  the  remaining 
elements  0^  =  For  a  more  thorough  treatment  on  Filler 
angles  in  4-space,  and  how  to  specify  orientation,  see 
(Hoffman].  The  challenge  in  assigning  tiie  'eu  degrees  of 
freedom  in  4  space  to  iiipiit  devices  that  exist  pliysically  m  3- 
space  is  to  promote  kinesthetic  sympathy  the  similarity  of 
input-motion  to  object-niotioii  (Gauch] 
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2.1.3  Discarding  input  Mappings 


y  y  y 


Figure  2,  These  are  three  of  the  six  axial  planes  In  xyzw-spaee, 
defined  by  the  axis  pairs  xw,  yw,  and  zh).  The  other  three  axial  planes 
(xz,  yz,  and  xy)  lie  in  the  3-dimensional  xyz-subspate. 

2.1  Mapping  2D  Input  to  3D  Transformations 

The  fact  that  an  input  device  is  constrained  within  a  physical  3- 
dimcnsional  world  will  impair  kinesthetic  sympathy.  The 
question  is,  how  much?  This  problem  is  very  familiar  in  a 
different  guise,  namely,  how  to  affect  direct  3D  manipulations 
with  a  2D  locator  such  as  a  mouse.  In  this  case  there  arc  six 
degrees  of  freedom  (three  Euler  angles  and  three  orthogonal 
translations)  to  associate  with  a  2'dimensional  input  space.  The 
popular  techniques  are  to  overload  the  input  space,  to  partition 
the  input  space,  to  discard  a  dimension  of  control,  or  to  create  a 
cross-product  of  the  input  space  by  using  multiple  locators.  The 
following  is  a  highly  compressed  review  of  these  techniques. 

2.1.1  Overloading  the  Input  Mapping 

We  can  overload  the  input  space  (x’,  y’,  z’)  by  extracting  x’  and 
y’  components  of  the  locator's  velocity,  and  assigning  the 
magnitude  of  circular  acceleration  to  the  z*  component 
[Evans],  Converting  these  components  into  translations  in  x,  y,. 
and  z  preserves  sympathy  for  x  and  y,  and  naturally  suggests  a 
screw-translation  for  z.  An  important  drawback  to  mapping  the 
input  space  this  way  is  that  the  locator's  velocity  and 
acceleration  are  not  decoupled.  If  the  user  wants  to  change  the 
direction  of  the  locator's  motion,  that  change  necessarily 
produces  a  circular  acceleration  and  hence  a  z-translation  in 
world  space  (figure  3), 


Figure  3.  At  the  bottom  point  of  this  circular  trsfjectory,  the  mouse’s 
velocity  is  purely  horizontal,  while  its  acceleration  is  purely  verticaL 


2.1.2  Partitioning  the  Input  Space 

We  can  partition  the  input  space  into  components,  each  of 
which  maps  the  locator  motion  to  the  object  motion  in  a 
different  maimer.  The  partition  can  be  explicit,  by  determining 
in  which  of  several  control  areas  a  cursor  lies  (Chen).  The 
partition  can  be  implicit,  by  comparing  the  motion  of  the  2D 
locator  to  the  orientation  of  a  3D  cursor  that  is  projected  to 
input  space  (Nielson).  Whichever  mapping  is  employed,  the 
user  must  be  prepared  to  change  his  motion  when  the  input 
space  switches  context,  and  must  be  aware  of  which  mapping  is 
being  invoked. 


There  are  several  ways  to  discard  a  degree  of  control  in  order  to 
eliminate  a  dimension  from  the  range  of  the  input  mapping. 
For  example,  two  angles  determine  a  position  on  the  unit  2- 
sphere.  Rather  than  specify  three  Euler  angles,  we  can  use  the 
locator’s  velocity  vector  to  determine  a  rotation  of  the  unit  2- 
sphere  and  hence  of  the  3-space  it  inhabits.  Alternatively,  we 
can  map  the  input  space  to  the  tangent  space  at  a  point  on  a 
surface  [Nielson,  Bier,  Haiuahan,  Smith],  in  order  to  control  the 
motion  of  the  object  by  controlling  its  motion  within  that 
tangent  plane.  Of  course  the  locator’s  motion  becomes  less 
sympathetic  as  the  tangent  plane  deviates  from  the  image  plane. 
A  more  abstract  problem  is  that  path-planning  can  become  very 
difficult  when  it  requires  a  route  through  successive  tangent 
planes  to  reach  a  target  orientation.  A  surface  in  3-space  that 
isn't  closed  or  that  isn’t  everywhere  differentiable  may  possess  a 
Gauss  map  that  does  not  :over  the  unit  sphere.  Such  a  surface  is 
difficult  or  impossible  to  orient  by  controlling  it  through  its 
tangent  or  normal  space. 

2.1.4  Taking  a  Cross  Product  of  the  Input  Space 

By  using  k  locators,  each  with  n  degrees  of  freedom,  we  permit 
w*  degrees  of  freedom  in  the  input  space.  These  can  be  realized 
cither  as  k  physical  locators,  as  one  logical  locator  with  a  i-way 
selector  to  map  physical-to-logical,  or  as  a  hybrid  of  the  two. 
Thus,  a  single  mouse  button  can  select  between  two  mapping^ 
of  the  mouse  position  into  the  world  [Chen). 

2.2  Mapping  Spaceballs  and  Joysticks  to  4D 
Transformations 

What  docs  the  experience  of  mapping  2D  input  to  3D 
manipulation  suggest  for  mapping  3D  input  into  4D 
manipulation?  Consider  each  of  the  four  approaches  outlined 
above.  (1)  Overloading  the  input  space  can  produce 
transformations  in  4-space  as  side  effects  of  an  attempted  3D 
manipulation  -  side  effects  which  novice  users  cannot  easily 
undo.  (2)  Nielson’s  method  for  partitioning  a  locator’s  2 
dimensional  space  extends  to  3D  for  translation,  but  it  docs  not 
lend  itself  to  rotations.  (3)  There  arc  problems  with  discarding 
one  or  more  dimensions  of  manipulation.  First,  mapping  a 
velocity  vector  in  3-sp.ice  into  rotations  of  the  unit  3-sphcrc  in 
4-space  is  a  promising  idea,  but  it  is  difficult  to  restrict  the 
input  so  as  to  rotate  the  projection  of  the  object  within  its 
projected  3D  subspace.  Second,  the  bigger  the  dimension  of  the 
space,  the  less  of  it  can  be  visited  by  excursions  in  a  2D  tangent 
plane  to  a  point  on  a  surface,  so  exploiting  local  surface 
properties  pays  a  much  smaller  dividend  than  it  did  in  3-space. 
(4)  Using  multiple  input  devices  can  be  inconvenient, 
requiring  ten  sliders  or  dials,  five  mice,  four  3D  joysticks,  or 
two  six-degree-of-freedom  spaceballs. 

What  choice  is  best?  There  may  be  no  single  optimal  technique, 
but  multiple  input  devices  at  least  promise  a  great  deal  of 
kinesthetic  sympathy  if  their  input  space  is  3-dimensional.  The 
relative  novelty  of  interactive  manipulation  in  4-space  is  a 
powerful  motivation  for  designing  a  sympathetic  interface.  Not 
many  people  have  developed  a  sense  of  how  surfaces  look  as 
they  rotate  in  4-space.  Consequently,  we  do  well  to 
approximate  that  motion  as  closely  as  possible  by  the  motion 
of  the  input  device.  Of  the  devices  listed  above,  spaceballs  and 
joysticks  provide  the  most  degrees  of  freedom.  How  ilien  can 
we  use  them  to  create  sympathetic  motion  m  4-space? 
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Translations  and  rotations  within  an  input  plane  x’y’  can 
sympathetically  and  uniquely  map  to  motion  within  an  image 
plane  defined  by  the  xy  plane  in  world  space.  But  the 
projection  from  4-space  to  the  screen  will  annihilate  two 
orthogonal  directions  z  and  w,  together  with  the  2- 
dimensional  plane  they  define.  This  plane  will  apparently  go 
"into"  the  screen  at  each  point.  Translation  in  the  z  or  w 
directions  and  rotation  in  the  xw,  yw,  xz,  or  yz  planes  thus 
present  a  problem.  If  the  input  device  moves  toward  the  screen, 
we  can  legitimately  map  that  motion  either  to  z  or  w.  Either 
choice  preserves  kinesthetic  sympathy,  but  the  map  is  not 
unique.  Rotation  in  the  zw  plane  is  also  problematic.  There  is 
no  physical  rotation  of  a  3D  input  device  sympathetic  to  this 
4D  rotation,  since  (in  our  physical  3-space)  such  a  rotation 
would  be  confined  to  the  1 -dimensional  input  space  z’.  The 
sympathetic  maps  are  tabulated  below  (figure  4). 


I'lliure  4,  The  mappings  of  3D  input  space  la  4D  world  space  that 
promote  kinesthetic  sympathy. 

Despite  the  ambiguities,  there  are  still  reasonable  ways  to 
convert  input  from  a  spaceball  or  a  Joystick  into  4D 
transformations.  A  spaceball  offers  six  degrees  of  freedom;  three 
translations  (x’,y’,z’)  and  three  rotations  (x’y’,x’z’,y’z’).  To 
extract  ten  degrees  of  freedom  requires  two  spaceballs,  either 
physically  or  logically, 

Tlie  mapping  from  input  space  to  object  space  can  be  defined  as 
follows.  Spaceball)  assigns  (x’,y’,z’)  to  (x,y,z)  for  calculating 
translations  and  rotations.  Spaceball'^  re-interprets  the  z’ 
coordinate,  assigning  it  to  w  instead  of  to  z,  Spaccballj  also 
makes  the  exception  that  rotations  in  its  x’y’-plane  map  to 
rotations  in  the  world's  zw -plane.  This  rotation  is  not 
sympathetic,  but,  as  pointed  out  above,  no  rotation  in  input- 
space  can  be  sympathetic  to  a  zw  rotation.  Note  that  two 
physical  spaceballs  compete  to  produce  x  and  y  translations 
under  this  scheme;  it  is  necessary  then  to  squelch  one 
spaceball’s  input  to  these  translations.  This  makes  the  two- 
spaceball  solution  somewhat  unattractive. 

3D  joysticks  that  use  twist  (about  the  Joystick  axis)  as  the  third 
degree  of  freedom  can  map  in  a  similar  way  to  the  spaceballs, 
using  two  Joysticks  to  mimic  the  mappings  of  a  single 
spaceball.  The  Joystick  rotates  in  each  of  three  planes  based  at  a 
common  origin.  Two  of  the  rotations  feel  like  translations  for  a 
short  interval:  when  the  joystick  is  centered,  a  rotation  in  its 
x’/.’  ory’z’  planes  is  momentarily  a  linear  translation  in  the  x’ 
or  y’  direction  (figure  5).  We  exploit  this  duality  to 
syiupalhelically  map  these  two  motions  into  either  rotation  or 
translation  in  4-space.  Twist  is  not  kinesthotically  sympathetic 


to  translation,  but  is  at  least  suisj'ostive  of  forward  motion  that 
results  from  rotating  a  sc.cw. 


Figure  5.  The  3D  Joyslicn  rotates  in  the  x’z‘,  y’z’,  and  x'y’  planes, 
which  can  produce  a  momentary  translation  in  the  x  and  the  y 
directions.  In  the  input  space  coordinates,  x’  is  rightward,  y’  is  forward, 
and  z’  is  vertical. 

We  need  four  (physical  or  ’iOgical)  Joysticks  in  order  to  supply 
the  ten  degrees  of  freedom  necessary  in  4-spacc.  We  can  map 
pairs  of  (logical)  Joysticks  the  same  way  we  map  the  spaceballs. 
Each  pair  allocates  translations  to  one  Joystick  and  rotations  to 
the  other.  Since  joysticks  have  a  small  range  of  motion,  it  is 
wise  to  treat  their  input  as  velocity  rather  than  position  when 
gross  manipulations  arc  desired. 

The  two  mapping  schemes  are  summarized  in  the  following 
table  (figure  6).  The  subscripts  indicate  which  logical  locator 
supplies  the  input. 
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Figure  6.  The  mappings  of  spacebaii  and  joystick  input  that  promote 
kinesthetic  sympathy  in  4D  world  space. 

It  is  inconvenient  to  re  home  the  hands  from  one  set  of 
Joysticks  to  another  in  the  midst  of  manipulating  an  object, 
Fourphronl  therefore  uses  only  two  physical  joysticks,  one  for 
each  hand,  multiplexed  as  four  logical  ones.  One  physical 
Joystick  functions  as  a  logical  pair  that  always  maps  (x’,y 
to  (x.y.z).  This  physical  joystick  embodies  logical  joysticks  I 
and  3  in  the  table  above.  T  iie  other  physical  joystick 
(corresponding  to  logical  joysticks  2  and  4  in  the  table)  maps 
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(x’.y’.z’)  to  (x,y,w),  with  the  same  caveat  that  it 
nonsympaAeticaily  maps  rotations  from  the  x’y’  input  plane 
to  the  zw  world  plane.  A  binary  state  variable  (governed  by  a 
joystick  button)  determines  whether  to  produce  translations  or 
rotations. 

It  is  not  uncommon  to  decouple  the  positioning  and 
orientation  operations  in  the  input  domain.  Experience  shows 
that  that  users  also  decouple  4D  manipulations  (the  ones  that 
involve  the  w-axis  in  world  space)  from  3D  manipulations 
[Hoffman]  in  order  to  inspect  the  change  that  was  made  to  the 
3D  projection  moving  the  model  in  4-space.  So  there  is  some 
justification  in  this  splitting  of  the  joystick  control  into  four 
parts.  The  other  natural  decomposition  would  assign  logical 
joysticks  1  and  2  to  one  device,  and  joysticks  3  and  4  to  the 
other. 

3  Projecting  to  3D:  Intersections, 
Transparency,  and  Silhouettes 

The  same  technique  for  projecting  surfaces  from  3-space  to  2- 
spacc  applies  to  projection  from  4-space  to  3-spacc.  A 
persper  >0  projection  requires  an  eye  point  eye^  in  4-space.  In 
(non-homogeneous)  normalized  eye-space  coordinates,  the 
point  (x,  y,  z,  w)  projects  to  (x/w,  y/w,  r/w)  in  the  3-dimcnsional 
image  volume,  A  second  eye  point  eyej  within  that  volume 
determines  a  further  projection  to  the  final  image  plane  (figure 

7). 


Figure  7.  The  (xyz  w)-axes  (UJi}  project  in  the  w-dirtedon  to  the  (xy 
2)-axes  (middle),  which  project  in  the  z-direction  to  the  (x  y)-axes  of 
the  image  plane. 


The  typical  side-effect  of  projection  is  that  the  resulting  surface 
intersects  itself  in  3-space,  even  if  it  has  no  intersection  in  4- 
space.  Why  is  that?  The  self-intersections  arise  when  a  ray  from 
eye^  strikes  the  surface  twice,  since  both  of  the  intersection 
[xiints  must  map  to  a  single  point  in  3-space.  This  is  the  usual 
situation  for  a  closed  surface  in  4-spacc,  just  as  it  is  for  a  closed 
curve  in  3-space:  the  shadow  of  a  “curvy"  space  curve  exhibits 
self-intersections  through  most  of  its  orientations . 

A  surface  is  imbedded  if  it  has  no  self-intersections  or 
singularities.  An  imbedded  surfaces  locally  looks  like  a 
neighborhood  in  the  plane  -  no  creases,  no  crossings.  If  a 
surface  imbeds  in  three  dimensions,  there’s  little  need  (from  tlte 
standpoint  of  topology)  to  study  it  in  four;  thus  the  interesting 
surfaces  are  generally  the  ones  that  contain  self-intersections 
when  projected  to  3-space,  because  they  fail  to  imbed  there. 
None  of  the  one-sided  surfaces  imbed  in  3-spacc.  Happily,  all 
tof  he  topological  surfaces  have  incarnations  that  imbed  in  4- 
sjiace. 

Typically  a  surface  that  we  ttansform  and  rotate  on  our  graphics 
machines  is  the  boundary  of  a  solid  object,  whether  the  object 


be  a  house  or  a  mountain  range.  Such  a  surface  may  bo 
geometrically  complex,  but  it  dutifully  performs  a  crucial 
topological  service:  it  separates  3-space  into  an  inside  and  an 
outside.  We  can  tour  the  surface  from  the  inside  (as  with  a 
building  walkthrough)  or  from  the  outside  (as  with  a  flight 
simulation  over  rugged  earth)  until  we  have  developed  a 
sufficiently  complete  mental  model  of  it.  We  need  not  cross  the 
surface  to  the  other  side. 

By  contrast,  a  self-intersecting  surface  separates  3-space  into 
any  number  of  subsets.  If  the  surface  is  opaque,  some  or  most  of 
its  pieces  remain  hidden  during  a  tour  of  a  particular  volume 
that  it  bounds.  Rotating  the  surface  in  4-space  may  reveal  a 
patch  of  surface  that  was  previously  hidden,  but  only  at  the 
expense  of  another  portion  of  the  surface  that  is  now  obscured. 
The  fundamental  problem  of  displaying  such  surfaces  is  that 
they  continually  hide  their  geometry  from  us.  Three  popular 
ways  to  tackle  this  problem  arc  to  use  ribboning,  clipping,  and 
transparency.  Overall,  transparency  is  the  most  helpful,  but  it 
has  certain  drawbacks  which  we  repair  in  §5. 

3.1  Ribboning 

To  reveal  the  geometry  of  a  self-intersecting  surface,  we  can 
slice  it  into  ribbons  [Kofakj.  The  gaps  between  ribbons  reveal 
parts  of  the  object  that  would  otherwise  be  obscured.  One 
advantage  of  ribboning  is  that  it  can  be  performed  once,  at 
model  definition  time,  and  then  left  alone.  Some  of  the 
drawbacks  arc  that  (1)  any  already-existing  no  i-ribboned 
datasets  must  be  remeshed  and  ribboned,  (2)  the  high-frequency 
edges  of  thin  close  ribbons  attract  the  attention  of  the  eye,  at 
the  expense  of  the  geometric  content  of  the  surface,  and  (3) 
ribbons  can  produce  distracting  moird  patterns  when  they 
overlap. 

These  drawbacks  do  not  mean  that  ribboning  is  a  clumsy 
technique.  On  the  contrary,  for  surfaces  that  can  be  foliated  by 
1 -dimensional  curves,  ribboning  is  a  very  elegant  means  of 
visualization.  The  compact  surfaces  that  admit  such  a  foliation 
are  the  torus  and  the  Klein  bottle.  Banchoff  has  made 
productive  use  of  this  technique  to  illustrate  the  foliation  of 
the  3-sphere  in  4-space  by  animating  a  ribboned  torus  that 
follows  a  trajectory  through  the  3-sphere. 

Surfaces  with  other  topologies  do  not  admit  such  a  simple 
ribboning.  'We  can  slice  a  surface  along  level  cuts  as  it  sits  in  4- 
space,  but  the  cuts  will  sometimes  produce  x-shaped 
neighborhoods  in  the  ribbons.  Morse  theory  deicrniines 
whether  a  surface  can  be  successfully  ribboned:  the  singularities 
of  a  Morse  function  on  a  surface  must  all  be  degenerate  viih  the 
topology  of  a  circle  [Milnor,  Morse). 

3.2  Clipping 

Rather  than  pre-compute  sections  of  the  surface  to  be  sliced 
away,  we  can  clip  them  out  dynamically.  The  chief  advantages 
arc  that  (1)  many  graphics  machines  implement  fast  hither 
clipping  as  part  of  their  rendering  pipeline;  (2)  no  special 
treatment  is  required  for  the  representation  of  the  model,  and 
(3)  by  clipping  the  surface  as  it  moves,  the  user  can  inspect 
views  of  it  that  a  single  static  segmentation  cannot  anticipate. 

There  are  drawbacks  to  clipping  We  usually  think  of  clipping  a 
surface  against  a  plane.  In  fact,  ciiiiping  is  properly  a  geometric 
intersection  of  a  surface  against  a  3-dimensiunal  volume  whose 
boundary  is  the  clipping  plane  In  4  space  a  plane  does  not 
bound  a  volume,  just  as  a  line  does  not  bound  an  area  in  3 
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space.  Instead,  a  4-dimensionaI  halfspace  clips  the  surface,  and 
the  boundary  of  the  halfspace  is  a  3-dimensionaI  flat,  or 
hyperplane.  It  is  true  that  a  user  could  interactively  specify  the 
position  and  orientation  of  the  4D  halfspace  that  does  the 
clipping,  just  as  he  can  control  the  position  and  orientation  of 
the  surface  under  scrutiny.  But  consider  the  problem  of 
providing  visual  feedback  to  show  where  that  clipping  volume 
is.  The  shape  of  the  clipped  surface  implicitly  defines  where  the 
boundary  of  the  clipping  volume  is.  In  3-space  we  can  mentally 
reconstruct  the  orientation  of  that  volume  from  the  clipped 
edges  it  leaves  behind.  It  is  much  harder  to  reconstruct  the 
orientation  of  a  clipping  volume  in  4-space  based  on  the  shape 
of  the  region  it  dips  away.  We  might  indicate  the  orientation  of 
the  4D  clipping  halfspacc  by  volume-rendering  its  boundary. 
Unfortunately,  that  boundary  will  tend  to  hide  the  surface  that 
remains  after  clipping. 

Recall  that  the  immediate  problem  is  to  view  the  component 
pieces  of  a  self-intersecting  surface.  In  particular,  to  see  ^yond 
a  patch  of  surface  that  hides  another  patch  behind  it.  “behind” 
being  in  the  z-direction  of  the  3-dimensionaI  space  to  which 
the  surface  has  been  projected.  If  this  is  truly  the  driving 
problem,  we  can  sufficiently  address  it  by  clipping  in  that  3- 
dimcnsional  space,  and  clipping  strictly  in  the  z-direction.  This 
amounts  to  nothing  more  than  hither  clipping.  To  summarize: 
clipping  in  4-space  is  mathematically  easy  but  interactively 
hard.  For  the  purpose  of  revealing  hidden  interiors,  however, 
hither  clipping  suffices. 


Figure  8,  Clippiitg  into  a  torus  produces  ajlgure^ight  contour.  Clipping 
reveuls  Internal  geometry,  but  complex  contours  cun  confuse  the 
shape. 


Hither  clipping  has  other  problems.  The  shape  of  the  surface 
region  that  gets  clipped  away  can  be  very  complex.  A  simple 
shape  is  one  that  is  topologically  equivalent  (homeomorphic) 
to  a  disk.  In  general  it  is  easier  to  make  sense  of  surfaces  whose 
clipped  regions  have  simple  shapes  rather  than  complex  shapes 
[Francis],  but  intersections  and  saddle  points  on  a  surface  cause 
ilie  clipped  regions  to  look  complex  (figure  8).  Secondly,  a 
clipping  plane  cuts  into  a  concave  region  of  a  surface  only  by 
culling  inlo  ihe  neighboring  regions  as  well.  This  is  noi 
necessarily  ihe  effeci  a  user  warns  lo  achieve.  Boih  of  ihesc 
shoricomings  can  be  remedied  by  using  more  exoiic,  cusiom 
shaped  clipping  volumes.  Thirdly,  clipping  ihe  fronimosi 
paiches  of  a  surface  exposes  some  of  ihe  hindmosi  paiches, 
which  may  be  behind  ihe  cenier  of  toiaiion  for  ihe  objecis.  The 
visible  pari  of  Ihe  surface  ihen  seems  lo  rotate  in  ihe  direclion 
antisympalheiic  lo  ihe  inpui  moiion.  This  shoricoming  is 
indcpendeni  of  ihe  shape  of  ihc  clipping  volume. 

3.3  Transparency 

Ribboning  and  clipping  simulaie  iransparency  via  a  binary 
classificalion.  Boih  classify  pans  of  ihe  surface  as  compleiely 
o|)aquc  and  ihe  oiher  parts  as  completely  transparent.  Why  not 
use  transparency  outright?  Ideally  a  semi-iransparenl  surface 
presents  all  of  its  self-intersecting  components  on  the  screen  so 
llidi  the  shape  of  each  layer  is  discernible.  In  practice  die  effect 


is  dramatic  and  helpful  for  many  surfaces.  But  there  arc  several 
things  that  can  hinder  the  usefulness  of  transparency. 

Disappearing  intersections.  The  intersection  of  two  opaque 
surface  patches  A  and  B  is  readily  apparent  whenever  their 
colors  differ.  On  one  side  of  the  intersection  we  have  A  atop  B 
(yielding  A’s  color):  on  the  other  side  B  atop  A  (yielding  B’s 
color).  As  the  patches  become  simultaneously  more  transparent, 
their  colors  blend  and  the  intersection  becomes  less 
distinguishable.  Intersection  curves  figure  prominently  in  the 
study  of  nonimbedded  surfaces,  so  it  seems  a  shame  to  apply 
transparency  at  their  expense. 

Disappearing  silhouettes,  A  surface  with  many  self- 
intersections  may  require  a  great  deal  of  iransparency  to  make 
the  deep  layers  visible,  but  then  the  outermost  layer  becomes 
nearly  invisible.  In  particular,  it  becomes  difficult  to  sec  the 
outline,  or  silhouette,  of  a  very  transparent  surface,  because  the 
silhouette  includes  the  rim  of  the  nearly-invisiblc  outermost 
layer. 

Reduced  performance.  Rotations  in  4-spac’e  change  the 
geometry  of  a  surface’s  3D  projection.  Polygons  that  were 
disjoint  one  frame  ago  now  imerpenetrate.  Polygons  that  were 
on  the  outermost  side  trade  places  with  polygons  on  the 
innermost.  Opaque  polygons  can  be  rendered  in  any  order,  so 
long  as  only  the  nearest  irolygons  (in  screen  depth)  survive  the 
rendering  process.  On  the  other  hand,  transparent  polygons  can 
be  rendered  from  back  to  front  or  from  front  to  back,  but  in  any 
case  they  must  be  rendered  in  sorted  order.  The  dynamic  3D 
geometry  caused  by  4-space  rotations  prevents  us  from  ordering 
the  model  by  a  static  data  structure  in  3-space,  such  as  a  binary 
space  partition  (BSP)  tree  [Fuchs831.  Does  the  BSP  tree  extend 
to  surfaces  in  4-spacc?  Alas  it  docs  not;  a  polygon  partitions  3- 
space  by  the  plane  in  which  it  lies.  But  a  plane  does  not  separate 
4-space. 

In  short,  to  render  transparent  (wlygons  we  must  be  prepared  to 
sort  them  dynamically,  perhaps  even  splitting  them  to 
eliminate  interpenctralions.  But  that  is  computationally 
expensive,  and  hence  slow. 

Loss  of  3D  depth  cue.  It  is  true  that  an  opaque  self-intersecting 
surface  hides  parts  of  itself  that  we  want  lo  see,  but  that  opacity 
serves  a  positive  purpose:  to  disambiguate  3D  depth  on  a  2D 
display.  Obscuration  is  a  powerful  depth  cue.  A  hidden 
polygon  is  obviously  farther  away  than  the  visible  polygon 
atop  it.  Transparency  reduces  or  eliminates  this  depth  cue, 
leaving  us  to  rely  on  other  cues  to  recover  3D  depth  One 
especially  helpful  cue  is  specular  reflection. 

Specular  highlights  reveal  surf.tce  geometry  in  two  ways  The 
shape  of  a  surface  is  easy  to  see  along  its  silhouette,  but  is  not  so 
apparant  in  the  neighborhoods  that  are  viewed  head  on  Phong 
highlights  help  exaggerate  the  curvature,  thereby 
distinguishing  the  shape  of  a  neighborhood.  Where  two 
iranslusceni  surface  patches  interpenetrate,  the  Phong 
highlights  can  disambiguate  which  surface  is  in  front, 
especially  when  we  rock  the  surface  back  and  forth.  Moreover, 
the  highlights  can  disambiguate  the  different  layers  that 
iransparency  reveals.  The  benefit  diminishes,  of  course,  as  the 
number  of  transparent  layers  increase-  b  >  .he  effect  is 
appreciable  through  three  or  four  layers. 

Transparency  is  an  essential  tool  for  studying  su: faces  in  I 
space,  since  it  reveals  the  behavior  of  the  patchc.  that  ii.i-.'v.-t 
each  other,  and  since  any  given  surface  is  likely  to  exhibit  sel' 
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intersections  when  it  is  projected  to  3-space.  But  transparency 
comes  with  a  price.  It  subdues  intersections  and  silhouettes.  It 
makes  rendering  slower.  It  makes  depth  more  ambiguous. 

In  order  to  redeem  transparency  as  a  tool  for  rendering  surfaces 
in  4-space,  we  can  address  these  demerits  in  the  following  ways. 
(1)  Highlignt  the  intersection  curves;  (2)  Highlight  the 
silhouette  curves.  (3)  Order  the  polygons  in  sub-linear  time.  (4) 
Apply  Phong  shading  to  recover  some  sense  of  3D  depth. 

Finding  the  intersections  and  silhouettes  could  be  slow,  and 
these  curves  will  often  change  with  every  frame.  In  §5  we 
discuss  techniques  for  computing  them  after  the  second 
projection,  from  3 -space  to  the  screen.  The  algorithms  exploit 
the  logic-enhanced  memory  on  board  Pixel-Planes  S. 
Fourphront  uses  these  techniques  in  the  presence  of 
transparency  and  Phong  shading  by  taking  advantage  of  the 
underlying  algorithms  on  Pixel-Planes:  multipass  transparency 
and  deferred  shading.  In  (back-to-front)  multipass  transparency, 
the  model  is  sent  to  the  SIMD  tenderers  multiple  times.  On  each 
pass,  a  pixel  processor  retains  the  geometry  of  the  backmost 
polygon  that  it  has  not  previously  retained,  then  blends  the 
shaded  result  into  a  temporary  frame  buffer.  This  technique 
requires  two  r-buffer  areas  per  pixel  processor.  Defencd  shading 
extracts  the  shading  operation  common  to  all  primitives,  and 
|)os))oned  applying  the  operation  until  after  all  the  primitives 
have  been  z -buffered.  Thus,  only  the  necessary  state 
information  (e.g.,  color,  reflectivity,  normal,  transparency)  is 
stored  per  pixel  at  the  time  the  geometry  of  the  primitive  is 
rendered. 

4  Projecting  to  3D:  Depth  Cues 

There  are  several  cues  that  lend  a  3D  effect  to  images  on  a 
computer  screen.  Among  them  are  obscuration,  shadows, 
illumination,  perspective,  parallax,  stereopsis,  focus,  and 
texture.  These  are  natural  cues  that  we  use  every  day  to  derive  a 
3D  model  of  our  world  from  the  2D  image  of  it  on  our  retinas. 

But  now  we  confront  a  serious  problem.  By  projecting  the 
image  of  a  surface  in  4-space  down  to  a  2-dimensional  screen, 
not  only  do  we  lose  depth  information  in  the  {-direction,  but 
we  lose  it  in  the  w-direction  as  well.  What  4-dimensional  depth 
cue  does  our  retina  employ  that  we  can  now  supply  when  we 
render  the  surface?  Evidently  there  is  none.  Since  both  the  z 
and  the  w  directions  are  perpendicular  to  the  screen,  we  might 
try  applying  some  of  the  usual  z-depth  cues  as  w-depth  cues. 
Tltis  strategy  risks  ambiguating  the  two  depths,  of  course.  The 
alternative  is  to  invent  w-depth  cues  that  have  no  basis  in  our 
physical  experience.  How  do  the  usual  z-depth  cues  extend  to 
four  dimensions? 

4.1  Obscuration  and  Shadows 

We  can  drop  down  a  dimension  and  liken  the  situation  to 
viewing  1 -dimensional  curves  in  3-space.  Space  curves  rarely 
obscure  or  cast  shadows  on  each  other:  only  at  isolated  points, 
in  general.  Similarly,  surfaces  in  4-space  only  obscure  each 
other  or  cast  shadows  on  each  other  along  mere  isolated  curves 
(in  general).  The  result  is  that  these  tues  are  not  especially 
helpful  for  recovering  w-depth. 

4.2  Illumination 

Again  we  consider  the  lower  dimenMonal  analog  to  our 
problem.  Illumination  is  ill-defined  along  a  curve  in  3  space, 
since  a  space  cune  has  an  entire  plane  for  its  nomtal  directions. 


The  usual  illumination  equation  does  not  apply.  Several 
researchers  have  observed  that  any  surface  with  co-dimension  1 
submits  to  ordinary  lighting  techniques,  and  have  jumped 
ahead  to  illuminating  3 -dimensional  surfaces  in  4-space 
jBurton,  Carey].  Burton  lets  a  polygon  inherit  the  normal 
vector  of  the  3-dimensional  volume  whose  boundary  includes 
it.  This  is  like  illuminating  a  polygonal  surface  in  3-space,  but 
only  displaying  the  result  on  the  polygonal  mesh.  The  problem 
with  non-orientable  surfaces  imbedded  in  4-spacc  is  that  they 
do  not  bound  any  volume  at  all,  Hansen  inflates  a  surface  to  a 
small  3-dimensional  volume,  like  wrapping  a  tube  around  a 
space  curve,  and  then  illuminates  that  bounding  volume  in  4- 
space  and  volume-renders  it  [Hansen].  The  images  arc 
satisfying,  but  the  technique  is  fairly  slow,  since  rendering 
volumes  is  considerably  slower  than  rendering  polygons. 

Illuminating  surfaces  in  4-spacc  is  thus  an  unresolved  problem. 
Fourphront  postpones  illumination  until  the  surface  is 
projected  into  3-sp8ce,  so  that  shading  looks  familiar  and 
realistic  on  the  projected  surface,  and  so  that  this  strong  z- 
depth  cue  is  preserved.  This  strategy  is  at  least  as  old  as  1880, 
when  it  was  used  to  shade  polygonal  faces  as  though  they  were 
illuminated  in  3-spacc  [Stringham].  The  obvious  drawback 
with  this  approach  is  that  the  shading  in  3-spacc  reveals  more 
about  the  shape  of  the  projected  surface  than  about  the  slia|tc  of 
the  surface  as  it  lies  in  4-spacc. 

4.3  Ptrsptctlv* 

A  perspective  projection  from  3-space  to  2-spacc  behaves  like 
an  orthogonal  projection  where  3-spacc  is  pre-warped:  planes 
parallel  to  the  image  plane  arc  first  shrunk  or  magnified 
according  to  their  distance.  A  perspective  projection  from  4- 
space  to  3-spacc  has  the  same  general  effect.  Volumes  shrink 
that  are  distant  from,  and  parallel  to,  the  volume  of  projection, 
but  volumes  grow  that  are  close  to  the  center  of  projection  eyc^. 
In  particular,  translating  a  neighborlinod  in  the  w-direction 
causes  its  projection  to  shrink  and  approach  the  origin.  Tltis 
behavior  can  disambiguate  relative  w -depth.  The  nearer 
neighborhood  changes  size  faster  than  tlie  farther  one. 

4.4  Stersopsis  and  Parallax 

Parallax  and  stereopsis  arc  side-effects  of  perspective 
projection,  and  they  offer  additional  w-depth  cueing 
[Armstrong].  Consider  the  effect  of  translating  the  eye.  Objects 
at  various  depths  in  the  world  change  their  relative  positions 
when  the  eye  shifts  in  the  x  or  y  directions.  But  which  eye 
position  (eye^  or  eycj),  and  which  deptlt  (z  or  w)? 

Let  us  again  drop  down  a  dimension  and  examine  the  situation. 
Consider  a  viewpoint  eyej  in  3-space,  and  the  image  plane  to 
which  the  world  projects  (figure  10).  Within  that  plane  there  is 
a  second  viewpoint  eyej  and  an  image  line  to  which  tlie  scene 
projects  further.  Two  spheres  A  and  B  in  the  3D  world  project  to 
two  disks  A*  and  B’  in  the  image  plane,  and  then  to  two 
segments  A"  and  B"  in  the  image  line.  Suppose  A"  and  B"  are 
only  slightly  separated.  If  eye2  shifts  to  the  right  and  A"  shifts 
to  tlie  right  relative  to  B”,  we  contlude  that  A*  is  farther  a».i> 
than  B’.  But  that  does  not  imply  that  the  source  object  A  is 
farther  from  eyej  than  B.  It  can  be  die  case  that  shifting  eye^  to 
the  right  causes  A”  to  shift  left  instead  (relative  to  B"). 
Translating  eye^  and  eye2  together  couple  these  behaviors.  Tlie 
situation  in  4-space  is  the  same.  We  have  a  choice  of  where  to 
apply  a  translation.  Applying  it  before  the  projection  from  1 
space  to  3  space  produces  rioiiiiiiuitoe  motion,  due  to  the 
parallax  from  die  \v  direction  the  projected  object  'a  no  long-r 
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rigid  under  the  expected  isometries,  although  the  source  object, 
of  course,  still  is. 


Figure  10.  When  there  are  two  eye  positions  involved  in  projecting  an 
image,  either  of  them  can  produce  parallax.  In  this  figure,  spheres  A 
and  B  project  from  3-space  onto  a  2-dimensional  plane  as  disks.  The 
disks  project  to  a  l-dimensional  line  as  segments.  By  tilling  the  page 
obliquely,  you  can  see  what  the  second  eye  sees.  Moving  an  eye  to  the 
right  will  make  thefarther  object  seem  to  move  to  the  right  o  the  nearer 
object.  Which  sphere  looks  closer?  It  depends  on  which  eye  does  the 
measuring.  A  is  closer  to  eyej  than  Bis.  But  the  projection  of  Bis  closer 
to  eyes  than  the  projection  of  A  is. 


4.5  Texturk 

The  texture  applied  to  a  surface  can  be  defined  dynamically  in 
world  space,  so  that  as  the  surface  moves  in  the  w  direction,  ih? 
texture  changes.  One  of  the  simplest  textures  is  color 
modulated  according  to  depth.  This  texture  is  well-known  as 
intensity  depth  cueing.  In  3-space  there  is  a  convenient 
metaphor  for  an  intensity  depth  cue  -  the  object  looks  as 
though  it  were  obscured  by  fog,  and  the  fog’s  color  prevails  as 
the  object  recedes.  In  practice,  the  4D  fog-metaphor  is 
considerably  less  convincing,  perhaps  because  the  usual  3D 
interpretation  is  so  much  more  natural. 

Encoding  w -depth  by  color  is  nonetheless  a  useful  tool, 
especially  for  locating  level  sets  according  to  the  color  they 
share.  The  idea  is  evidently  pretty  obvious,  since  there  are  very 
old  examples  of  its  use  [Hinton],  A  more  modem  treatment  of 
the  strategy  might  be  to  apply  a  dynamic  texture  to  a  surface, 
where  the  texture  continually  Hows  in  the  w-direction 
(Freeman,  van  Wijk). 

4.6  Focus  and  Transparency 

The  human  eye  can  focus  at  various  dcptlis.  Ncightorhoods  of  a 
surface  that  lie  within  the  focal  plane  in  3-spacc  ap))car  crisp. 
Neighborhoods  that  are  nearer  or  farther  look  increasingly 
blurry.  There  are  various  techniques  for  producing  this  effect 
during  rendering  [Haebcrli,  Mitchell,  Potmesilj. 

In  4-space  we  could  denne  a  focal  volume  at  some  particular 
distance  in  w.  Neighborhoods  within  this  volume  would 
appear  crisp,  while  neighborhoods  outside  would  be 
progressively  blurry.  In  general  this  is  not  a  fast  process,  since 
blurry  polygons  are  effectively  semitransparent,  and  hence 
incur  some  of  the  cost  of  computing  uansparency.  But  we  cun 
approximate  the  effect  cheaply  by  simply  modulating 
transparency  by  w -depth.  If  the  focal  volume  is  at  the  yon 
distance,  uansparency  will  unambiguously  determine  w-depth. 
Recall  that  neighborhoods  near  to  eyes  ate  generally  large  duo 
to  perspective,  and  often  enclose  the  far-..way  neighborhoods 
that  have  shrunk  toward  the  origin.  If  the  outermost  patches  of 
a  surface  arc  opaque,  they  hide  the  interior  geomeuy.  This  is  the 
motivation  for  choosing  a  focal  volume  at  the  yon,  rather  than 
the  hither,  distance;  it  is  more  likely  to  reveal  tlic  interior  of  a 
self-intersecting  surface.  Unfortunately,  the  eye  docs  not 
resolve  uansparency  with  a  great  deal  of  resolution,  so  this 
technique  is  best  applied  for  gross  classification  of  relative 
distances  in  the  w  direction. 

5  Finding  Siihouettes  and  intersections 
During  Projection  to  2D 

This  section  describes  a  screcn-oricntcd  technique  for  locating 
silhouette  curves  and  intersection  curves.  In  §3  we  described  tlie 
powerful  advantage  transparency  gives  for  visualizing  self- 
intersecting  surfaces,  but  noted  that  although  transparency  lets 
us  see  moic  layers  of  the  surface,  it  strips  those  layers  of  some  of 
their  geometric  content.  In  particular,  the  intersections  and 
silhouettes  are  less  apparent  on  uansparent  surfaces. 

We  can  estimate  the  amount  of  computation  required  for 
calculating  the  geometry  of  these  curves  and  for  rendering 
semi-uansparent  surfaces.  The  conclusion  is  that  even  for  a 
modest-sized  pul>gunal  model,  the  burden  on  the  traditional 
front  end  of  a  graphics  system  becomes  too  great. 
Programmable  SIMD  renderers  let  us  shift  some  of  the 
computation  away  from  die  math  processors  on  Pixe'.-Planes  5, 
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which  makes  it  possible  to  display  silhouettes  and  intersection 
curves  of  a  dynamic  3D  (projected)  surface  at  interactive  rates. 

5.1  Calculating  In  3*8pace 

Consider  the  task  of  manipulating  a  surface  composed  of  n  = 
2000  triangles  (this  is  a  skimpy  polygon  budget  to  spend  on 
self-intersecting  surfaces).  The  cost  of  transforming  and 
ordering  these  semi-transparent  triangles,  along  with 
calculating  their  silhouettes  and  intersections,  is  substantial. 
Depending  on  the  particulars  of  the  algorithms  we  employ,  we 
can  easily  spend  0{n  tog  n)  floating-point  operations  sorting 
the  polygons  (as  required  for  transparency)  and  computing 
their  intersections.  Since  the  geometry  is  dynamic  in  3-space  as 
the  surface  rotates  in  4-space,  this  cost  is  charged  per  flame.  The 
transformations  and  projections  from  4-space  to  the  screen  can 
take  another  250n  floating-point  operatioiu.  So  we  easily  face 
over  l.S  million  floating-point  operations  for  this  meager  data 
set.  These  estimates  disregard  all  other  necessary  operations;  the 
front-end  system  must  sustain  well  over  30  MFLOPS  in  order  to 
calculate  the  intersecting  geometry  at  interactive  speeds  of 
20liz.  By  using  multiple  CPUs  to  achieve  this  speed,  we  incur 
substantial  communication  cost  or  memory  contention.  In 
either  case,  the  time  complexity  is  super-linear  in  the  number  of 
polygons.  The  conclusion:  avoid  sorting  and  avoid 
analytically  computing  the  intersections  in  3-space. 

Pixel-Planes  S  offers  programmable  SIMD  logic-enhanced 
frame-buffers  (the  renderers)  that  can  offload  much  of  the 
burden  from  the  geometry  processors  (Ellsworth,  Fuchs89].  In 
particular,  we  can  use  the  SIMD  renderers  to  ordc,  the 
polygons,  to  And  the  silhouettes,  and  to  And  the  intersections. 
For  the  case  of  2000  triangles,  the  renderers  can  relieve  the 
geometry  processors  of  over  half  their  floating-point  burden 
and  reduce  their  communication  cost. 

5.2  Sllhoutn*  Curvts 

Analytic  Solution.  There  are  several  ways  to  define  a 
silltouette.  In  common  usage,  a  silhouette  is  the  boundary  of 
the  projection  of  a  surface  onto  the  image  plane.  But  a  more 
generous  definition  counts  any  point  on  a  differentiable 
surface  as  a  silhouette  (or  contour)  point  if  the  eye  vector  lies 
within  the  tangent  plane  to  the  surface  at  that  point.  The  second 
choice  is  preferable  for  self-intersecting  surfaces,  since  we  wish 
to  highlight  the  silhouettes  of  the  component  patches  that  nest 
inside  a  transparent  image.  A  simple  way  to  And  a  silhouette 
(whose  transverse  is  non-inflecting)  is  to  locate  every  edge  that 
is  shared  by  two  polygons,  one  facing  forward  and  the  other 
facing  backward  from  the  eye.  But  if  the  polygon  data  is 
distributed  among  many  processors,  the  processor  that  owns  a 
given  polygon  will  not  necessarily  hold  the  neighboring  ones, 
even  for  a  mesh  that  is  static  in  3-space.  Note  too  that  this 
technique  only  identifies  silhouettes  along  mesh  boundaries  of 
a  polygonal  representation  of  the  model,  and  not  in  the 
|H)iygons’  interiors. 

We  can  analytically  compute  the  silhouette  for  surface  patches 
that  are  defined  parametrically  (Schweit/er,  Lane],  but  this  docs 
nut  lake  advantage  of  the  SIMD  renderers  of  Pixel-Planes. 

Streen-based  Solution.  Consider  a  screen-oriented  approach 
to  finding  silhouettes.  As  a  routine  step  in  Phong-shading,  the 
Pixel-Planes  renderers  hold  tlie  information  necessary  to  locale 
silhouettes,  namely,  the  interpolated  suiface  normals  and  the 
eye  vector.  Each  tenderer  covers  a  region  on  the  screen  and 
holds  hundreds  of  bits  of  information  per  pixel  in  the  region. 


These  pixels  are  operated  on  in  SIMD  fashion.  If  the  normal  to  a 
point  on  a  polygon  is  orthogonal  to  the  eye  vector,  the  point 
lies  on  a  silhouette  curve. 

We  can  use  the  renderers  to  perform  a  dot  product  between  the 
normal  vector  and  the  eye  vector  at  every  pixel,  which 
identifies  the  silhouette  if  the  dot  product  is  zero.  (If  the  eye  is 
sufficiently  far  away,  the  projection  is  nearly  orthogonal,  and  it 
suffices  to  test  just  fire  z-component  of  the  normal.)  This  yields, 
at  best,  a  1-pixel-thick  line  on  a  curved  surface;  at  worst,  it 
misses  most  pixels  on  the  silhouette  because  of  the  imperfect 
sampling  of  the  normal  vector.  We  might  treat  a  pixel  as  a 
silhouette  point  if  the  dot  product  is  within  some  threshold  e  of 
zero,  thereby  enlarging  the  silhouette’s  thickness  on  the  screen 
(figure  11), 

But  thresholding  has  problems.  As  e  gets  large,  false  silhouettes 
appear  wherever  the  surface  is  sufficiently  edge-on  to  the  eye, 
and  the  silhouette  becomes  much  fatter  in  some  places  than  in 
others.  The  false  silhouettes  are  inherent  to  thresholding  since, 
for  example,  a  planar  section  of  the  surface,  and  containing  the 
eye,  may  have  an  inflection  whose  tangent  lies  arbitrarily  close 
to  the  eye  vector.  The  inflection  point  will  appear  as  a 
silhouette  point,  even  though  there  may  be  no  silhouette  in  its 
vicinity. 


Flgurt  11,  Tht  turfact  normal  it  nearly  orthogonal  to  the  eye  vector  in 
the  vicinity  of  a  silhouette  curve. 


The  reason  that  the  ihrcsholdcd  silhouette  has  varying 
thickness  is  that  the  curvature  of  the  surface  may  vary  from 
place  to  place.  A  silhouette  point  with  a  large  magnitude  of 
normal  curvature  in  the  silhouette’s  tra.nsvcrse  direction  will 
witness  its  normal  vector  changing  direction  quickly  along  a 
path  toward  the  eye.  A  large  value  of  e  may  still  produce  a  thin 
silhouette  region.  Meanwhile,  a  silhouette  point  with  a  small 
magnitude  of  normal  curvature  in  the  transverse  direction  will 
witness  its  normal  vector  changing  direction  slowly  along  a 
path  toward  the  eye.  The  same  value  of  e  produces  a  thick 
silhouette,  since  there  are  points  over  a  large  area  (even  as  seen 
from  the  eye)  whose  normals  are  nearly  perpendicular  to  the  eye 
vector. 

Note  that  silhouettes  need  not  be  computed  when  a  polygon 
first  enters  the  pixel’s  memory.  We  need  only  look  for 
silhouettes  on  visible  polygon  fragments  that  ultimately 
survive  z  buffering.  We  defer  shading  until  after  the  polygons 
have  been  transformed  and  their  z-buffered  geometry 
(including  normal)  has  been  stored  in  the  pixel  memory.  Thus 
we  incur  the  expense  of  silhouette  computation  only  once  per 
frame  (or,  for  multipass  transparency,  only  once  per  pass),  rather 
than  once  per  polygon. 

Having  found  a  silhouette,  what  do  we  do  w  iih  it?  The  question 
concerns  visualization  in  its  abstract  sense  How  tan  we 
effectively  map  the  internal  state  at  a  pixel  onto  the  available 
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dimension  of  output  ie.g.,  red,  green,  and  blue)?  A  simple 
solution  is  to  map  silh'iuettes  to  a  particular  color  that  is 
known  to  be  absent  elsewhere  in  the  rendered  surface.  Such  a 
color  may  not,  of  course,  exist.  But  assigning  a  constant  color 
on  the  silhouette  of  a  smoothly  shaded  surface  is  often,  in 
practice,  a  sufficient  visualization.  In  the  case  of  a  transparent 
image,  it  can  also  be  effective  to  assign  complete  opacity  to  a 
silhouette  in  order  to  make  it  stand  out.  In  fact,  we  can  relax  the 
binary  classification  of  lilhouttes  in  favor  of  a  reaUvalued 
measure  of  “silhouetteneas."  If  the  inuinsic  opacity  of  the 
surface  at  a  point  is  a,  let  the  effective  opacity  be  l-(l-a)''", 
where  d  is  the  dot  product  of  the  eye  vector  and  the  normal 
vector.  Surfaces  then  become  increasingly  opaque  near  their 
silhouettes,  which  mimics  the  natural  b^avior  of  transparent 
laminas.  Viewed  away  from  the  normal  by  an  angle  whose 
cosine  is  d,  a  lamina  of  width  w  intercepts  a  ray  through  a 
distance  w/d. 

5.3  InttrMction  Curvos 

If  the  projected  surface  in  3-space  were  static,  we  could 
analytically  compute  the  intersection  curves  (Baraff,  Moore] 
once  and  for  all.  Since  transformations  in  4-spaco  make  its  3- 
space  projection  change  shape  dynamically,  we  recompute  it 
each  frame.  This  can  be  accomplished  easily  within  the  SIMD 
tenderers.  The  straightforward  approach  to  Onding  intersections 
is  to  modify  the  usual  t-buffer  algorithm.  We  test  the  r-value  of 
each  incoming  polygon  at  each  pixel  against  the  contents  of 
the  r-buffer,  retaining  the  polygon's  state  information  if  the 
polygon  is  closer.  If  the  new  value  matches  the  r-buffer,  we 
count  it  as  an  intersection.  If  we  have  flagged  an  intersection 
and  then  a  closer  polygon  comes  along,  we  unset  the 
intersection  flag.  The  result  is  that  all  the  frontmosi 
intersections  will  be  flagged. 

The  proof  of  correctness  is  easy.  Let  (?()  be  the  set  of  polygons 
that  cover  a  pixel,  indexed  by  the  order  in  which  they  arrive, 
and  let  Pj  and  P^  (j<k)  be  two  of  them  that  participate  in  the 
front-most  intersection  at  that  pixel.  The  i-buffer  must  contain 
Sj  after  P:  is  processed.  Since  Pj  is  frontmost  at  the  pixel,  the  r- 
buffer  still  contains  rj  when  P^  is  processed,  thereby  setting  the 
intersection  flag.  Since  P^  is  frontmost  at  the  pixel  the  flag  will 
not  be  unset.  At  the  end  of  the  pass,  we  have  found  an 
intersection.  By  piggy-backing  on  the  multipass  algorithm  for 
transparency,  we  can  fmd  all  the  interior  intersections,  since 
they  will  be  fronunost  intersections  at  some  particular  pass. 

Two  polygons  that  share  an  edge  formally  intersect  each  other 
along  it.  Polygons  whose  edges  pass  through  pixel  centers  will 
“intersect”  at  those  pixels.  These  are  spurious  intersections,  and 
not  the  kind  of  intersection  we  are  trying  to  show.  We  could  be 
careful  not  to  scan-convert  pixels  more  than  once  on  the 
common  boundary  of  adjacent  polygons.  This  technique 
presents  a  problem  for  a  machine  like  Pixel-Planes,  which  is 
suited  to  rendering  entire  polygons  as  primitives,  without 
maintaining  connectivity  information.  But  in  fact  the  pixel 
already  holds  sufficient  information  to  eliminate  spurious 
intersections;  surface  normals.  The  intersections  we  wish  to 
highlight  are  those  of  polygons  diving  through  each  other, 
whose  normals  are  different  where  they  interpenetrate.  Since  the 
SIMO  renderers  interpolate  vertex  normals,  that  information  is 
available  per  pixel.  We  can  thus  modify  the  r-conipa,'ison. 
requiring  that  the  dot  product  of  tiie  new  normal  with  the  old 
normal  be  less  than  unity  in  magnitude. 

Exact  matching  against  the  z-buffer  can  identify  at  best  a  1- 
pixel-widc  intersection  curve.  At  worst  it  misses  much  of  the 


curve  due  to  imperfect  sampling  (just  as  is  the  case  with 
silhouette  curves).  We  remedy  this  problem  by  thresholding.  If 
the  ineoming  pixel  is  within  e  of  the  z-buffer  value,  we 
consider  it  an  intersection  point.  This  introduces  the  same 
artifact  of  variable-width  curves  on  the  screen.  If  two  polygons 
intersect  each  other  at  a  shallow  angle,  their  separation  remains 
small  over  a  large  area  of  the  screen,  and  the  curve  that  satisfies 
lz„ew '  *otdl  <  c  >s  tuany  pixels  wide.  If  they  intersect  each  other 
at  a  steep  angle,  a  short  excursion  to  neighboring  pixels  will 
End  them  separated  far  apart.  We  can  use  the  interpolated 
normals  of  the  polygons  at  pixels  near  the  intersection  in  order 
to  approximate  a  fixed-width  intersection  curve.  But  note  that 
the  added  computation  is  charged  per  polygon,  and  cannot  be 
deferred  to  end-of-pass  unless  we  retain  the  geometric  state  of 
both  polygons.  Also  note  that  most  implementations  of  the  z- 
buffer  algorithm  interpolate  reciprocal-z  across  the  polygon. 
Over  small  extents  or  for  large  original  values  of  z, 
thresholding  produces  nearly  the  same  behavior  even  when 
using  the  reciprocal.  But  for  locating  intersections  across  large 
ranges,  it  is  wise  to  recover  the  true  depth. 


Figuf*  12.  Al  Ihtir  common  inimtcllon,  two  polygont  shore  z-ra/uer. 
The  t‘\mloes  on  within  some  threshoU  of  eoch  other  along  o  thiekenetl 
intersection  coree. 


Another  artifact  of  thresholding  is  that  the  thickened 
intersection  curve  gets  trimmed  near  silhouettes,  since  the 
depth-comparison  is  strictly  within  the  z-dircc'l<'n  rather  than 
the  normal  directions  of  the  participating  polygons.  This 
artifact  is  hard  to  overcome  without  using  pixel-to-pixcl 
communication. 

6  Future  Work 

There  are  several  research  areas  that  tiiis  project  has  identified.  A 
hemi-3-sphere  can  be  mapped  to  the  input  space  of  a  spaceball. 
How  effective  are  the  induced  rotations  in  4-space,  and  can  die 
user  produce  the  rigid  motion  within  the  3-spaee  to  which  a 
surface  projects?  Surfaces  can  be  clipped  in  4-spacc  against 
volumes  with  3-dimcnsional  boundaries.  Are  there  effective 
ways  to  shape,  to  position,  and  to  display  the  volume  or  its 
boundary  interactively?  Is  there  an  effective  algorithm  (like 
the  BSP  tree)  for  precomputing  the  rendering  order  for 
polygons  projected  from  4-space  to  the  screen?  Is  there  a  spetnly 
and  natural  way  to  illuminate  surfaces  in  4-space?  What  is  the 
best  interface  for  producing  uncoupled  parallax  in  ei’her  4- 
space  or  the  3-space  to  which  it  projects?  In  what  ways  can 
texture  be  used  as  a  w-depth  cue?  A  quadric  approximation  to  a 
surface  contains  curvature  information,  which  can  improve 
both  the  silhouette  and  intersection  calculation  for  fixed-width 
curves.  What  are  fast  ways  to  produce  ihi.'  second-degree 
approximation  and  fast  ways  to  use  it  on  a  per-pixel  basis?  Our 
consideration  of  silhouettes  was  motivated  by  the  loss  of 
geometric  content  that  transparency  produces.  Hence  we 
discussed  silhouettes  as  seen  by  eye^.  What  useful  information 
do  eye4  silhouettes  add  to  a  surface? 
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7  Conclusions 

The  shape  of  surfaces  in  4-space  can  be  difficult  to  comprehend. 
Interactive  computer  graphics  provides  an  excellent  tool  for 
making  the  surfaces  seem  more  real,  since  we  can  manipulate 
them  ourselves.  The  effort  is  full  of  trade-offs.  In  order  to 
control  all  the  degrees  of  freedom  in  4-space,  we  need  multiple 
input  devices  in  3-space.  We  can  apply  transparency  in  order  to 
reveal  the  interior  of  a  self-intersecting  projection,  but  then  we 
lose  the  intersections  and  the  silhouettes.  We  can  then  highlight 
those  special  curves,  but  at  the  expense  of  the  system’s 
performance  or  memory.  We  can  steal  some  of  the  usual  z-depth 
cues  and  use  them  as  w-depth  cues,  but  that  tends  to  make  z- 
dcpth  more  ambiguous  again. 

This  paper  has  focused  on  shortcomings  of  the  various 
techniques  in  order  to  encourage  other  people  to  enter  the  fray 
and  invent  solutions.  Until  the  advent  of  the  powerful  graphics 
computers  we  have  today,  mathematicians  could  only  imagine 
interacting  in  four  dimeiisions.  Experience  with  Fourphront 
demonstrates  that  the  effort  can  pay  off,  that  we  can  open  a 
window  on  the  truly  "virtual  world"  of  four  dimensions.  The 
collateral  spinoffs  are  algorithms  that  can  be  of  service  to  the 
more  pedestrian  problems  in  three  dimensions. 
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10  Illustrations 

The  surfaces  in  the  color  plate  section  were  rendered  on  Pixel- 
Planes  5.  Each  surface  was  transformed,  illuminated,  and 
rendered  on  5  in  0.2  seconds  or  less,  and  each  has  between  s'< 
and  10k  polygons.  There  are  two  light  sources;  one  slightly  left 
of  the  eye,  and  one  above  and  to  the  right  of  the  eye. 
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Abstract 

We  briefly  discuss  hyperbolic  geometry,  one  of  the  most 
useful  and  important  kinds  of  non- Euclidean  geometry. 
Rigid  motions  of  hyperbolic  space  may  be  represented 
by  4  X  4  homogeneous  transformations  in  exactly  the 
same  way  as  rigid  motions  of  Euclidean  space.  This  is  a 
happy  situation  for  those  of  us  interested  in  visualising 
what  life  in  hyperbolic  space  might  be  like,  because  it 
means  we  can  use  existing  graphics  hardware  and  soft¬ 
ware  libraries  to  anitriaie  scenes  in  hyperbolic  space. 
We  present  formulas  for  computing  reflections,  tran<j- 
lations,  and  rotations  in  hyperbolic  spare.  These  are 
a  bit  more  complicated  than  the  corresponding  formu¬ 
las  for  Euclidean  geometrv,  which  emphasizes  our  need 
for  graphics  libraries  which  allow  completely  arbitrary 
4x4  transformations. 

The  use  of  4  X  4  transformations  to  represent  isome¬ 
tries  of  hyperbolic  space  is  not  new;  it  has  been  used 
since  the  discovery  of  non- Euclidean  geometry  in  the 
19-centnry.  The  new  part  of  our  work  is  the  application 
of  this  theory  to  real-time  3D  computer  graphics  tech¬ 
nology,  which  for  the  first  time  ever  is  allowing  mathe¬ 
maticians  to  interactively  explore  hyperbolic  geometry. 
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Introduction 

The  use  of  4  X  4  matrices  to  represent  affine  transfor¬ 
mations  of  Euclidean  S-space  is  well-known  in  computer 
graphics.  Most  graphics  languages  include  provisions 
for  specifying  4x4  transformations,  and  most  interac¬ 
tive  graphics  workstations  have  the  ability  to  multiply 
4x4  matrices  in  hardware.  These  capabilities  were  de¬ 
signed  with  Euclidean  geometry  in  mind,  because  we 
think  of  the  space  in  which  we  live  as  Euclidean  3-space. 

There  are,  however,  alternate  systems  of  geometry 
which  are  of  interest  in  mathematics  and  physics  re¬ 
search  and  education.  One  of  the  most  important  of 
these  is  hyperbolic  geometry.  Hyperbolic  space  arises 
naturally,  even  more  so  than  Euclidean  geometry,  in 
the  study  and  classification  of  3-manifolds.  It  is  also 
frequently  taught  in  introductory  geometry  courses  be¬ 
cause  it  is  in  some  sense  the  simplest  and  most  ele¬ 
gant  type  of  non-Euclidean  geometry.  Learning  hyper¬ 
bolic  geometry  forces  one  to  challenge  many  assump¬ 
tions  which  are  usually  taken  for  granted,  in  the  process 
strengthening  one’s  geometric  reasoning  skills. 

The  “space”  of  hyperbolic  geometry  consists  of  the 
interior  of  the  unit  ball  in  R®  ;  the  boundary  of  the  ball, 
the  unit  sphere,  is  “at  infinity”.  Distance  is  redefined 
to  approach  infinity  as  we  move  closer  to  this  sphere. 
Prom  a  hyperbolic  point  of  view,  therefore,  we  can  never 
actually  reach  the  boundary  sphere.  We  can  think  of 
hyperbolic  space  as  consisting  of  points,  lines,  planes, 
surfaces,  etc,  just  as  in  Euclidean  space.  In  hyperbolic 
space,  however,  some  of  the  rules  of  geometry  are  dif¬ 
ferent.  Specifically,  Euclid’s  fifth  postulate  is  not  valid: 
in  the  hyperbolic  plane  there  are  many  lines  through  a 
given  point  which  do  not  intersect  a  given  line.  Another 
non-Euclidean  property  is  that  the  sum  of  the  angles  in 
a  planar  polygon  is  always  less  than  180  degrees.  It 
is  possible,  for  example,  to  have  a  “regular  right  pen¬ 
tagon”  (all  five  sides  are  equal  and  all  five  angles  are  90 
degrees).  Figure  1  shows  a  tesselation  (tiling)  of  hyper- 
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Figure  1:  Tiling  of  the  hyperbolic  plane  by  regular  right 
pentagons.  All  angles  in  this  picture  are  right  angles  in 
the  hyperbolic  metric,  and  all  pentagons  are  congruent. 


bolie  2-space  by  such  pentagons. 

These  differences  between  Euclidean  and  hyperbolic 
space  mean  that  the  intuition  which  we  have  from  liv¬ 
ing  in  what  we  perceive  as  essentially  Euclidean  S-space 
is  of  little  value,  and  may  actually  hinder  us,  in  an  ef¬ 
fort  to  understand  hyperbolic  geometry.  It  would  be 
extremely  useful,  therefore,  for  researchers  and  geome¬ 
try  students  rdike,  to  be  able  to  experience  some  of  what 
life  in  hyperbolic  space  might  be  like. 

Fortunately,  since  the  transformations  of  hyperbolic 
S-space  can  be  represented  as  4  x  4  matrices  in  much 
the  same  way  as  with  Euclidean  transformations,  we  can 
use  the  matrix  capabilities  of  many  graphics  languages 
and  hardware  systems  to  create  images  and  to  animate 
motions  in  hyperbolic  space.  We  must,  however,  be 
able  to  use  completely  arbitrary  4x4  transformations, 
because  the  matrices  which  arise  in  hyperbolic  geometry 
are  different  from  those  of  Euclidean  geometry. 


Hyperbolic  Space 


In  the  following  discussion  we  think  of  vectors  as  column 
vectors;  so  a  6  represents  the  4x1  matrix 

Xa^/ 

and  its  transpose  the  1  x  4  matrix  (ui  02  03  04). 
Thus  a^b  is  the  usual  dot  product  of  a  and  b,  and  ab^ 
is  a  4  X  4  matrix,  sometimes  called  the  outer  product  of 
a  with  b. 


In  computer  graphics  points  in  Euclidean  3-space 
are  commonly  represented  by  homogeneous  coordi¬ 
nates  —  i.e.  vectors  in  R^,  where  any  two  vectors 
which  are  scalar  multiples  of  each  other  are  consid¬ 
ered  to  represent  the  same  point.  The  3-dimensional 
coordinates  (01,02,03)  of  a  point  in  R^  are  called 
its  affine  coordinates.  We  can  convert  affine  co¬ 
ordinates  to  homogeneous  coordinates  by  appending 
a  1  as  the  4-th  coordinate  to  obtain  (01,02,03,1), 
and  we  can  convert  arbitrary  homogeneous  coordinates 
(01,03,03,04)  to  affine  coordinates  by  normalizing  to 
obtain  (01/04,02/04,03/04)  (assuming  04  ^  0).  The 
advantage  of  homogeneous  coordinates  is  that  rigid 
Euclidean  motion  (isometries),  as  well  as  perspective 
projections,  can  be  represented  by  multiplication  by 
4x4  matrices.  The  isometries  of  R^  correspond  to 
the  semidirect  product  of  the  3-dimensional  orthogonal 
group  0(3)  with  the  S-dimensional  translation  group. 
Recall  that  an  orthogonal  matrix  M  is  one  which  pre¬ 
serves  the  inner  product  of  vectors;  Ma  •  Mb  =:  a  •  b. 
The  inner  product  in  this  case  is 

a  ■  b  =  ai6i  -h  0363  +  0363, 

where  we  assume  that  a  and  b  are  normalized. 

Using  other  inner  products  yields  non-Euclidean  ge¬ 
ometries.  The  inner  product 

(a,  b),  ~  a\bi  +  0363  -♦•  aaba  +  <1464. 

yields  spherical  geometry,  and 

(a,  b)h  =  0161  -f  0363  -l-  0363  —  0464. 

yields  hyperbolic  geometry.  Our  treatment  of  hyper¬ 
bolic  geometry  is  in  terms  of  (■,  ■)/>',  analogous  deriva¬ 
tions  using  (’,•),  instead  would  yield  the  correspond¬ 
ing  formulas  for  spherical  geometry.  Note  that  the  Eu¬ 
clidean  inner  product,  by  ignoring  the  4-th  coordinate, 
can  be  seen  as  a  bridge  between  these  two  inner  prod¬ 
ucts. 

(•,  )/,  is  called  the  Minkowski  inner  product.  The 
Minkowski  inner  product  can  also  be  described  as  fol¬ 
lows.  Let 


(1  0  0  0  \ 
0  I  0  0  I 

0  0  10 
000-1/ 


Then  (a,  b}/,  =  a^I^'^b.  The  group  of  4  x  4  matrices 
which  preserve  the  .Minkowski  inner  product  is  denoted 
0(3,1). 

Now  consider  the  vectors  =  {a  €  R'*  ||  (a,  a)/,  < 
0}.  The  set  V-  forms  a  solid  cone  along  the  4-th  axis 
with  vertex  at  the  origin.  Hyperbolic  3-space,  denoted 
is  the  projectivization  of  V^_,  with  the  metric  in¬ 
duced  by  the  Minkowski  inner  product,  vectors  in  V'_ 
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correspond  to  the  homogeneous  coordinates  of  points  in 
H3.  Each  point  in  is  represented  by  a  unique  vector 
with  4-th  coordinate  1,  which  can  be  obtained  from  any 
vector  in  VL  by  normalization,  just  as  in  the  Euclidean 
case.  (The  fact  that  the  vector  lies  in  VL  guarantees  that 
the  4-th  coordinate  is  nonzero.)  This  gives  a  model  of 
consisting  of  those  points  of  VL  with  4-th  coordi¬ 
nate  1;  this  is  the  same  as  the  interior  of  the  unit  bail 
in  3-space.  Hyperbolic  space  thus  consists  only  of  the 
points  inside  this  ball. 

Two-dimensional  hyperbolic  space,  also  called  the  hy¬ 
perbolic  plane,  consists  consists  of  the  interior  of  the 
unit  disk.  Although  the  discussion  below  is  in  terms  of 
hyperbolic  S-space,  it  extends  straightforwardly  to  any 
dimension.  In  particular,  the  illustrations  and  examples 
we  give  are  all  in  two-dimensions  (the  3-rd  coordinate 
is  0)  to  simplify  the  computations  and  the  figures. 

The  geodesics  (straight  lines)  in  this  model  of  hyper¬ 
bolic  space  are  the  same  as  the  Euclidean  straight  lines 
passing  through  the  unit  ball,  except  that  we  only  con¬ 
sider  the  part  of  the  line  inside  the  ball.  Similarly,  the 
hyperbolic  planes  in  are  the  same  as  the  Euclidean 
planes. 

The  hyperbolic  distance  between  two  points  a  and  b 
with  homogeneous  coordinates  a  and  b  is  given  by 


Figure  2;  Hyperbolic  Reflections.  Triangle  a6c  is  the 
reflection  of  triangle  a'b'c'  in  point  p.  The  two  triangles 
are  congruent  in  hyperbolic  space,  and  hence  would  ap¬ 
pear  to  be  of  equal  size  to  an  observer  inside  the  space. 


d^yP(a,6)  =  2  cosh"* 


(a.b); 


(a,a)/,(b,b)*‘ 


(1) 


A  simple  calculation  shows  that  this  formula  is  invariant 
under  multiplication  of  a  and  b  by  scalars,  and  hence 
depends  only  on  a  and  6.  It  is  also  easy  to  verify  that 
if  a  remains  fixed  and  we  let  b  approach  the  boundary 
of  the  unit  ball,  then  d^^P(a,6)  approaches  infinity. 

The  model  of  hyperbolic  space  that  we  are  using  here 
is  called  the  projective  model,  or  the  Klein  model,  af¬ 
ter  the  19-th  century  mathematician  who  popularized 
it.  A  more  familiar  model  is  the  conformal  model,  also 
known  as  the  Poincare  model.  In  the  conformal  model, 
geodesics  are  arcs  of  circles  perpendicular  to  the  bound¬ 
ary  sphere  (or  circle,  in  two  dimensions).  Each  model  of 
hyperbolic  space  has  its  advantages  and  disadvantages. 
The  projective  model  seems  better  suited  for  visualiza¬ 
tion  and  computer  graphics,  because  geodesics  appear 
“straight”  and  the  isometries  can  be  represented  by  pro¬ 
jective  linear  transformations. 


Matrix  Formulas 

The  isometries  of  correspond  to  the  matrices  in 
0(3, 1),  just  as  the  isometries  of  Euclidean  3-space  cor¬ 
respond  to  the  matrices  in  0(4).  We  now  present  for¬ 
mulas  for  computing  the  matrices  of  rigid  motions  in 
hyperbolic  space. 


Reflections 


One  of  the  simplest  types  of  isometries  is  a  reflection.  If 
p  represents  the  homogeneous  coordinates  of  a  point  p  in 
H®,  then  the  4  x  4  matrix  for  the  hyperbolic  reflection 
in  p  is 

r|)yP  =  I-2pp^l"'‘/(p,p)*.  (2) 


This  same  formula  may  be  used  to  obtain  the  matrix 
for  the  reflection  in  a  plane  as  well.  In  this  case,  p 
represents  the  homogeneous  coordinates  of  the  plane. 

Note;  (2)  can  also  be  used  to  give  the  matrix  for  a 
Euclidean  reflection,  by  replacing  with  I  and  the 
Minkowski  inner  product  with  the  dot  product. 

To  use  (2)  in  an  example,  let  p  =  (0.5, 0.0,0),  and 
consider  the  triangle  with  vertices  a  =  (0.2,0  0,0.0), 
b  =  (-0.5, -0.5, 0.0),  and  c  =  (-0.5, 0.5, 0.0)  —  see 
Figure  2.  Then  we  can  use  the  homogeneous  coordinates 
/0.5V 


P  = 


0 

0 

1 


to  obtain 


1.666 

0 

0 

-1.333 

0 

1 

0 

0 

0 

0 

1 

0 

1.333 

0 

0 

-1.666 

To  transform  a  point,  say  a,  by  this  reflection,  we  multi- 
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ply  its  homogeneous  coordinates 


by  this  matrix 


0.2 

0 

0 

1 


to  obtain 


/ 


\-1.4. 


and  then  normalize  to  obtain  the 


point  a!  —  (0.714,0,0).  TVansforming  6  and  c  similarly 
gives  5'  =  (0.929,0.214,0),  and  c'  =  (0.929,-0.214,0). 

Although  the  two  triangles  in  2  look  very  different 
from  a  Euclidean  point  of  view,  they  are  congruent  in 
hyperbolic  space.  One  may  verify  this  by  using  (1)  to 
compute  the  hyperbolic  lengths  of  the  triangles’  edges. 
For  example  d^^P(a,6)  =  d^^P(a',6')  =  2.074.  (Be 
sure  to  use  homogeneous  coordinates  in  (1)!) 


IVanslations 


We  can  now  define  hyperbolic  translations  in  terms  of 
reflections.  Just  as  in  Euclidean  space,  the  translation 
which  takes  a  point  a  to  a  point  h  is  the  composition  of 
the  reflection  in  a  with  the  reflection  in  the  midpoint  m 
of  a  and  6; 

(3) 

The  homogeneous  coordinates  m  of  the  hyperbolic  mid¬ 
point  are  given  by  the  formula 

m  =  a3/(b,b)),(a,b)/,  +bv/(a,a)A(a,b)/,,  (4) 


where  a  and  b  are  homogeneous  coordinates  for  a  and 
t,  respectively. 

As  an  example,  consider  the  triangle  from  Figure  2 
again.  And  let  6'  =  (0.3,  -0.7, 0).  We  compute  the 

matrix  of  translation  Using  the  homogeneous 


coordinates  for  6  and  6'  in  (4)  gives  lu  = 


/  ‘  \ 
-0.733  I 

0  I 
\  1.212  / 


for  the  midpoint.  Using  (2)  and  (3)  then  gives 


/  1.676 

0.814 

0 

1.572 

-1.369 

0.636 

0 

-1.130 

0 

0 

1 

0 

\  1.919 

0.257 

0 

2.179 

(5) 


The  images  of  a,  6,  and  c  under  this  transformation 
are  a'  =  (0.744,  -0.548,0),  6'  =  (0.3,  -0.7,0),  and  c'  = 
(0.846,-0.095,0);  see  Figure  3. 

To  continue  this  example,  we  can  translate  6'  again 
by  (5)  and  obtain  6"  =  (0.585,-0.771,0),  which  lies 
on  the  line  containing  6  and  6'.  The  points  6,  b',  and 
6"  lie  at  equally  spaced  intervals  along  this  line  in  the 
hyperbolic  metric. 

An  important  fact  about  hyperbolic  translations  is 
that  each  has  a  unique  axis  This  is  different  from  Eu¬ 
clidean  translations,  where  it  is  only  the  direction  of  the 
axis  that  matters,  not  the  particui^-  choice  of  axis. 


Figure  3:  Hyperbolic  Translation.  Triangle  a'6V  is  ob¬ 
tained  by  translating  triangle  a6c  along  line  /  from  b  to 
6';  the  two  triangles  are  congruent  in  hyperbolic  space. 

Rotations 

A  rotation  of  about  an  axis  /  through  the  origin  is 
the  same  as  the  Euclidean  rotation  about  the  same  axis, 
since  this  rotation  preserves  the  unit  ball.  To  compute 
the  matrix  of  rotation  about  an  axis  not  passing  through 
the  origin,  we  first  translate  /  the  origin,  do  the  rotation 
there,  ''id  then  translate  /  back  to  its  original  position. 
The  cc'.v-o.  *■  these  three  transformations  gives 

a  rotation  about  c.it  i^t.^inal  axis.  In  order  for  the  angles 
to  work  out  right,  we  must  translate  along  the  unique 
line  through  the  origin  perpendicular  to  /  If  lo  is  the 
point  of  /  closest  to  the  origin,  this  is  the  translation 
q,hyp 
*lo.O  • 

Specifically,  suppose  a  and  6  are  points  in  and  we 
wish  to  rotate  through  an  angle  of  0  about  the  line  / 
through  a  and  6.  The  point  /q  of  /  closest  to  the  origin 
is  given  by 


_  a(a-b)  b(b-a) 

(a  -  6)  (a  -  6)  (6  -  o)  •  (6  -  n)^' 


(6) 


Note  that  in  (6)  a  and  6  are  the  affine  (not  homoge¬ 
neous)  coordinates  of  points  in  H^,  and  ■  is  the  usual 
dot  product.  The  desired  hyperbolic  rotation  is  then 


~  '  ■'•/o.O  J 


peuc 


T,hyp 


(7) 


where  is  the  Euclidean  rotation  of  through  an 

angle  of  0  about  an  axis  in  the  direction  of  u,  where 
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Figure  4:  Hyperbolic  Rotations.  Triangle  a'6V  is 
obtained  by  rotating  triangle  a6c  about  the  point  p 
through  and  angle  of  t/i  radians.  The  other  four  tri¬ 
angles  are  obtained  by  additions  rotations  through  the 
same  angle.  All  six  triangles  are  congruent  in  hyperbolic 
space. 


u  s  (a  ->  6)/||a  -  6||  is  a  unit  vector  in  the  direction  of 
/.  is  given  by  ([3],  p.  73) 

(Uj  +  c(l  -  u?)  uiujci  -  u^a  U1U3C1  +  ujs  0\ 

ujujci-huas  U2  +  c(l-U2)  ujuaci  -  ujs  Oj 

uiuscj  -  U2S  ujuaci  -f  ui«  u|  +  c(l  -  u|)  0  I 

0  0  0  1/ 

where  c  =  co6(tf),  a  =  sin(0),  and  cj  =  1  -  co6(0). 

To  give  anotiier  example  using  the  above  triangle,  we 
compute  the  rotation  about  the  line  I  through  the  points 
p  =  (0.5, 0,0)  and  q  =  (0.5,0, 1).  This  line  is  perpen¬ 
dicular  to  the  r~y  plane  (in  both  the  Euclidean  and  hy¬ 
perbolic  metrics)  and  hence  this  rotation  preserves  the 
x-y  plane. 

The  point  Iq  from  (6)  is,  of  course,  just  p.  Using 
«  =  (0,0, 1)  in  (7)  we  obtain 


0.333 

-1 

0 

0.333 

1. 

0.5 

0 

-0.5 

0. 

0 

1 

0. 

-0.333 

-0.5 

0 

1.167 

The  images  of  a,  6,  and  c  by  this  transformation  are 
then  a'  =  (0.364,  -0.273, 0),  6'  =  (0.421,  -0.789, 0),  and 
=  (-0.308,-0.692,0).  Figure  4  shows  the  resulting 
triangle,  as  well  as  the  next  five  images  under  the  trans¬ 
formation  (8). 


Figure  5:  Scene  from  the  video  Not  Knot.  This  scene 
shows  a  tesselation  of  hyperbolic  space  by  regular  right 
dodecahedra  —  analogous  to  a  tesselation  of  Euclidean 
space  by  cubes. 


Applications 

Three  recent  projects  at  the  Geometry  Center  have  ap¬ 
plied  these  ideas.  One  is  the  video  Not  Knot  [4].  This 
video,  whose  purpose  is  to  illustrate  some  of  the  basic 
concepts  of  knot  theory  and  the  theory  of  3-manifolds, 
includes  a  fly-through  scene  of  hyperbolic  3-space;  see 
Figure  5.  During  this  fly-through  one  easily  notices  that 
apparent  size  changes  more  rapidly  in  hyperbolic  space 
than  in  Euclidean  space.  Angles  appear  to  change  as 
we  move  closer  to  them.  In  fact,  however,  they  are  not 
changing  —  what  changes  is  our  perception  of  them. 

Another  project  which  has  used  4x4  matrix  tech¬ 
nology  in  this  way  is  a  flight  simulator  for  hyperbolic 
space  written  by  Linus  Upson,  a  Princeton  University 
undergraduate  working  as  a  research  assistant  during 
the  summer  of  1991.  Patterned  after  the  popular  SO! 
flight  simulator,  Upson’s  program  allows  one  to  navigate 
through  a  scene  in  hyperbolic  space;  see  Figure  /ref- 
flg:hfly.  The  program  is  excellent  for  conveying  a  sense 
of  how  angles  and  distances  seem  to  change  with  motion. 
The  intuition  which  one  gains  from  this  experience  is 
hard  to  pinpoint  but  extremely  valuable  in  understand¬ 
ing  hyperbolic  geometry. 

The  third  Geometry  Center  project  using  hyper¬ 
bolic  transformations  is  a  general  graphics  library  which 
we  call  the  “Object  Oriented  Graphics  Language' 
(OOGL),  begun  by  Pat  Hanrahan  in  the  summer  of 
1989.  This  library  provides  a  general  framework  in 
which  geometric  objects  and  the  actions  which  oper¬ 
ate  on  them  may  be  specified  arbitrarily.  This  makes 
it  easy  to  define  and  manipulate  objects  in  hyperbolic 
space.  The  interactive  viewing  program  which  accom¬ 
panies  OOGL  (AfinneView)  has  a  “hyperbolic  mode” 
in  which  the  translations  and  rolaiions  controlled  by 
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[4]  Gunn,  Charlie,  et.  al.  “Not  Knot”  [videotape]  Jones 
and  Bartlett.  Copies  of  this  video  may  be  ordered 
by  contacting  Jones  and  Bartlett  Publishers,  Inc, 
20  Park  Plaza,  Suite  1435,  Boston,  MA  02116- 
9792. 

[5]  Thurston,  William.  The  Geometry  and  Topology 
of  Three-Manifolds,  volume  1.  Princeton  University 
Press,  to  appear.  Chapters  1  and  2  provide  a  good 
introduction  to  hyperbolic  geometry. 


Figure  6:  Hyperbolic  space  flight  simulator.  This  scene 
shows  the  view  from  the  cockpit  of  an  airplane  flying 
over  a  hyperbolic  plane  in  hyperbolic  3-space.  The  plane 
is  tesselated  with  regular  right  pentagons  —  it  is  essen¬ 
tially  a  copy  of  Figure  1. 


mouse  motions  are  hyperbolic  rather  than  Euclidean.  A 
version  of  this  program  for  SGI  IRIS  workstations  may 
be  obtained  on  the  Internet  via  anonymous  ftp  from 
host  gaom.umn.adu  (IP  address  128.101.25.31). 
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Human  Engineering  the  User  Interface  to  Spaceland 
Stuart  Card,  Xerox  Palo  Alto  Research  Center 


As  we  spread  our  wings  in  an  attempt  to  escape  user 
interface  Flatland,  it  is  useful  to  set  current  work  on 
interactive  3D  systems  in  the  context  of  work  on  human- 
computer  interaction  generally. 

Taking  the  long  view,  we  can  see  the  history  of 
human-computer  interface  design  as  a  set  of  inventions 
having  different  relative  impact  In  fact,  it  is  interesting 
to  plot  these  on  a  sort  of  seismic  scale  of  innovation 
according  to  how  much  they  shake  the  status  quo.  3D 
animated  interactive  gnq;}hical  user  interfaces  look  like 
theywillbelongonthehighendofthisscale.  Wecanlook 
in  more  detail  at  where  we  are  by  plotting  the  wodc  of  this 
conference  against  a  characterization  of  work  in  human- 
computer  interaction  broadly  deflned.  Such  an  analysis 
reveals  work  mainly  on  the  computational  side,  with 
some  attention  to  applications.  In  the  end,  computer 
systems  tobe  successful  involve  arranging  a  fit  among  the 
system,  the  context  of  use,  and  human  characteristics.  It 


is  work  on  the  fit  to  human  characteristics  that  is  most 
ladcing.  Technology  often  develops  through  a  cycle  of 
point  designs,  abstraction,  characterization,  and  articula¬ 
tion  of  design  principles  (not  necessarily  in  that  order). 
While  much  of  the  progress  in  interactive  3D  interfaces 
will  continue  to  be  the  result  of  intuitive  and  analogical 
point  design,  it  is  my  contention  that  we  can  already  begin 
to  pursue  the  abstraction  and  characterization  of  parts  of 
the  design  space. 

I  will  give  examples  of  abstractions  that  attempt  to 
relate  the  missing  human  characteristic  point  of  the 
system-use-human  triangle.  These  will  include  percep¬ 
tual.  motor,  and  cognitive  interactions  and  also  the  char- 
acteristicsof  the  task  environment.  Such  abstractions  can 
be  used  in  design  as  "tools  for  thought"  to  speed  the 
identification  of  interesting  pans  of  the  vast  new  user 
interface  Spaceland  now  open  for  exploration. 
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Funkhouser,  Sdquin,  and  Teller,  “Management  of  Large  Amounts  of  Data  in  Interactive  Building  Walkthroughs' 


1.  The  sixth  floor  of  the  building  model  (242,668  faces).  IV.  Another  typical  observer  viewpoint.  Visible 
The  eye-to-cell  visibility  set  (30,265  faces)  for  a  objecU  are  rendered  using  the  highest  level  of 

typical  observer  viewpoint  is  outlined  in  blue.  detail  for  every  object  (23,468  faces  are  drawn). 


11.  Cell-io-cell  visibility  and  polyhedral  hounds  V.  Satne  viewpoint  as  in  Plate  IV.  Detail  has 

on  ihe  vuuble  portions  of  reached  cells  for  the  been  reduced  for  objects  that  appear  small  to 

cell  containing  ihe  observer  of  Plate  I.  the  observer  (7,555  faces  are  drawn). 


Ill  K\>--lu-obj<.Cl  Mblhilll)  fur  the  observer 
of  Plate  1  Wiieframe  objects  are  incident  upon 
\lsib!  cells  but  not  111  the  [lotelitially  visible  .set 


\'l  Same  viewpoint  as  in  Plate  1\'  Shading  re()resent 
the  level  ol  detail  chcj-eii  tor  e  ach  objei  t  111  Plate  \ 
Darker  shades  represent  hlither  levels  of  detail 


Figure  1:  A  sample  trial  from  Experiment  1;  Effect  of 
shadow  sharpness  on  the  perception  of'object  size  and  po< 
sition. 


Figure  2:  Shadow  sharpness  levels  used  in  Experiments  1 
and  3.  The  sharpness  levels  are  (from  left  to  right):  no 
shadows,  hard  shadows,  and  soft  shadow. 


Figure  3:  .A  sample  trial  from  Experiment  2:  Effect  of 
shadow  shape  on  the  perception  of  object  size  and  posi* 
tion. 


Figure  5:  A  sample  trial  from  Experiment  3:  Effect  of 
shadow  sharpness  on  the  perception  of  object  shape. 


Figure  4:  Shadow  shape  levels  used  in  Experiment  2.  The 
shape  levels  are  (from  left  to  right);  no  shadows,  true 
shadows,  and  bounding  volume  shadows. 


Figure  6:  An  example  of  the  detrimental  effect  of  soft 
shadows  in  experiment  3.  The  ball  and  pear  siiapes  (left 
and  right  objects  in  each  image  pair  respectively)  are  dis¬ 
tinguished  by  the  tapered  end  of  the  pear  shape  when  hard 
shadows  are  present.  The  feature  is  obscured  under  soft 
shadows  (the  pair  of  images  to  the  right)  causing  confusion 
between  the  two  shapes. 


Wanger,  ‘The  EfTcct  of  Shadow  Quality  on  the  Perception  of  Spatial  Relationships  in  Computer  Generated  Imagery” 


Color  Plate  1:  Color  detail  of  MusicWorld.  “Device  Synchronization  Using  an  Optimal  Linear  Filter" 
Martin  Friedmann,  Thad  Stamer  &  Alex  Pentland. 
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PLATE  1:  Front-to-back  antialiased  rendering  without  sub-pixel  bit  masks. 
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Plate  1  Plate  2 

128x128x124  CT-study  ol  a  child. 

Data  is  courtesy  of  Dr.  Frans  Zonnevold, 

Philips  Medical  Systems,  The  Netherlands. 


Neumann,  'Tnteraclivc  Volume  Rendering  on  a  Multicomputer” 


Ilk 


Photo  1  -  Living  Room  in  Walkthrough 


Photo  2  -  Kitchen  in  Walkthrough^ 


Photo  3  •  Fireplace  with  Animated  Fire^  Photo  4  -  Landscape  in  Head-Mount  Bike^ 


Photo  5  -  Environment  Mapped  Teapot^  Photo  6  -  Animated  Water  Waves* 


(1)  640x512  resolution  one  sample/pixel 
(21  1280x1024  resolution  one  sample/pixel 
[3]  1280x1024  resolution  56  samples/pixel 
Permission  granted  to  reproduce  these  pictures. 

Rhoades,  Turk,  Bell,  State,  Neumann,  and  Varshney,  “Real-Time  Procedural  Textures” 


Color  Plate  1  Color  Plate  2 

A  tree  trunk  being  extruded.  A  palm  branch  is  being  marked  for 

The  user  is  roughly  five  times  copying  using  a  rubber  banding  box. 

taller  titan  the  houses. 


Color  Plate  3 

The  branch  has  been  copied  four  times. 
Part  of  a  toolbox  menu  is  visible. 

The  red  ring  is  the  “magic  cariwt”, 
showing  the  track  r  range. 


Color  Platt  4 

The  u.ser  is  now  normal  si/.e. 

The  airplane  is  the  cur.sor.  which  indicates 
that  the  “Hying"  tool  is  being  u.sed. 


Butterwortli,  Davidson,  Hench,  and  Olano,  "3DM.  A  Three  Dimensional  Modeler  Using  a  Head  Mounted  Displa>  ’ 
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Plato  1:  This  shows  the  screen  presented  to  the  user  in  the  Volume  Sccilliligs  system.  The  sllcer/lsoRiitf.ice  interface  !•>  in 
the  upper  left.  The  purple  outline  indicates  the  slicing  plane.  The  red  areas  of  the  isosutfaec  indicates  areas  on  or  near  the 
slicing  plane.  Areas  behind  the  slicing  jdane  are  rendciing  using  “screen  door"  tians|>arenc\.  Seeds  can  be  depuiited  on  the 
slicing  plane  which  will  then  clfect  the  subserpient  rendering  of  the  volume  in  the  upper  right. 


Cohen,  Painter,  Mehta,  and  Ma,  “Volume  Seedlings” 
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Plate  1:  (upper  left)  Helo- 
cow  stalking  a  M-106  Self- 
propelled  mortar. 

Plate  2:  (upper  right)  Results 
of  collision  between  a  M-35 
2  1/2  ton  truck  traveling  at 
medium  speed  and  a  tree. 

Plate  3:  (middle  left)  The 
Helo-cow’s  round  nearly  im¬ 
pacts  with  an  M-2FAADS 
tank. 


Plate  4:  (bottom  left)  Multi¬ 
ple  formations  of  tanks  and 
aircraft  on  tracks  generated 
by  NPSNET-MES.  V-22 
Ospreys  and  AH- IT  Cobras 
provide  close  air  support. 


Zyda,  Pratt,  Monahan,  and  Wilson, 
“NPSNET:  Constructing  a  3D 
Virtual  World” 
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Plate  1:  Protein  with  760  atoms. 


Plate  2:  Two  Iteliccs  before  rotations. 


Plate  4:  Non-bondcd  interactions  represented 
witli  partial  wireframe  spheres. 


Plate  3:  Two  helices  after  lot.itioiis 


•Surles,  Interactive  Modeling  Enhanced  ivith  Constraints  and  Ph>sics — With  Applications  m  Moles ul.ir  .MoJeline  ' 
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Conner,  Snibbe,  Herndon,  Robbins,  Zeleznik,  and  van  Dam,  ‘Three-Dimensional  Widgets” 


Color  Plate  IV.  The  rack  widget  is  used  to  perform  deformations  on  any  geometric  object.  Iiere  a  cube  (a).  Drag¬ 
ging  the  blue  handle  downward  tapers  the  cube  (b).  Deformations  are  applied  to  the  region  of  the  cube  between 
the  blue  and  pink  bars.  Rotating  the  pink  handle  twists  the  cube  about  the  gold  bar  (c);  pulling  the  red  handle 
upward  bends  the  cube  (d).  Finally,  below,  we  deform  a  geometric  model  of  a  pocket  knife  using  the  rack  (e). 


Conner,  Snibbe,  Herndon,  Robbins,  Zeleznik,  and  van  Dam,  ‘Three-Dimensional  Widgets' 


Chung, 


Color  Plato  1 .  On*  of  tho  modols  used  In  th«  abstract  beam  targetlns  task.  The  white  lines,  which  were 
not  displayed  to  the  subject,  represent  the  double  cone  of  HLS  color  space,  In  which  the  dodge  balls  are 
randomly  distributed  and  colored  according  to  their  position  in  HLS  space,  The  multi-colored  target  ball 
Is  at  the  center  of  the  double  cone. 


A  Comparison  of  Head'tracked  and  Non-head>tmcked  Steering  Modes  in  the  Targeting  of  Radiotherapy 

Treatment  Beams" 
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A  iof.ological  sjiiere  knoiled  in  4'$Mce.  The  hiiher  clipping 
plane  in  3-5paee  reveals  some  of  ihe  internal  geometric 
complexity  of  the  surface. 


Torus  imbedded  in  4.$pacc  as  (cos  s,  sin  a.  cos  t,  sin  l),  'Ihe  inner 
core  of  the  toms  is  fanner  in  \v  than  the  outer  core,  hance  its  color 
is  more  amber,  and  its  si/e  is  diminished  by  perspective. 


Ihe  knotted  sphere  sliced  into  ribtwns.  Ihe  inter-ribbon  gaps  ate 
semi-transparent  to  suggest  the  continuity  of  the  geometry.  Note 
the  moird  patterns  emerging  in  the  middle. 


Opacity  increasing  in  die  w-direciion.  'Ihe  opaque  interior 
remfotces  the  interpretation  that  ihe  inner  pan  of  the  toms  is 
fanher  away  in  w  than  the  outer  pan. 


tilem  bolUe  imbedded  in  4-spate,  wnh  color  growing  more  amber 


hne^  but  not  m  4-space,  as  rescaled  by  the  differing  colors  on 
cither  side  of  the  intersection  curve. 


^4 


Turui  viwvsv'J  f(oiu  4  Jtffwutu  w)  w  putfU  Bs  ihc  c)c’  lo  liu* 

right  in  4  apawCt  mc  fjnhvc  iKighlKJihuuUs  shift  nghi  rcUti^w 
luncjrcr  nwighbothood^  ihv  unhvr^  mnwf  wort;  of  the  iuf\i» 
slides  right  cunpared  to  the  outer  core 


Banka,  “Interactive  Manipulatton  and  Diaplay  of  Two- Dimensional  Surfaces  tn  Four  Dimenaional  Spat 
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A  projcclivc  plane  ihai  sclf-intcrsecls  in  4-space.  The  upper  part  Uiihcr  clipping  the  opaque  projective  plane  reveals  the  boiloin  of 

of  the  vertical  self-intersection  persists  under  all  rotations  in  the  black  intersection  curve.  'Ihe  curve  is  thinner  where  the 

4-S|Uce.  intersecting  patches  dive  steeply  through  each  other. 


Koiated  view  of  the  projective  plane.  The  intersection  curve  has  a  Clipping  to  reveal  internal  silhouettes.  Their  w  idth  varies  with  the 

tenninus  at  the  top  of  the  Figure  and  another  terminus  midway  surface's  curvature.  There  >s  a  false  silhouette  where  the  Imiiom  of 

down  the  surface,  which  the  fronimost  neighborhood  hides.  the  surface  inflects  in  the  eye  plane  across  the  curve. 


hoiccuve  plane  rendered  with  transparency.  The  interior  pans  uf  Prujevi.  e  plane  rendered  with  transparcrie),  silhouettes,  and 

the'  surface  are  visible,  showing  the  lower  terminus  of  the  intersections, 

intersection  curve.  Rut  the  mterscctiun  curve  is  less  prominent. 


Banks,  “Inieractive  Manipulation  and  Diaplay  of  Two-Dimensional  Surfaces  in  Four- Dimensional  Space' 
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